Design Specification

1. Purpose
2. Design enviroment
3. Reinforcement learning
   3-1.The sequence of reinforcement learning
4. Maze exploration using reinforcement learning
   4-1.Q table
   4-2.Q - Learning
   4-3.Learning results
5. Challenge

24th LSI Design Contests-in Okinawa Design Specification - 4-2

4-2.Q - Learning

There are several methods for updating Q-value, and this example deals with Q-Learning (Following: Q Learning and Notation) which is one of them. The equation for Q value update in Q learning is shown below.

In the above equation, α, γ and r are hyperparameters.

The flow of Q value renewal is examined using an actual example.
The agent grasps that its current state is S1.
From Table 1, the action to advance to "↓" with the highest Q value is selected.
Since the transition destination becomes S6, a value of 0 is received as a reward.
Q value is updated based on the reward.

By repeating the above 1 ~ 5, the Q value table approaches the optimum one.

<<Back Next>>

Contents

Design Specification

24th LSI Design Contests-in Okinawa Design Specification - 4-2

4-2.Q - Learning