24th LSI Design Contests-in Okinawa  Design Specification - 4-2

4-2.Q - Learning

There are several methods for updating Q-value, and this example deals with Q-Learning (Following: Q Learning and Notation) which is one of them. The equation for Q value update in Q learning is shown below.

Q-learning-eq1

In the above equation, α, γ and r are hyperparameters.


Hyper parameter

  1. The flow of Q value renewal is examined using an actual example.
  2. The agent grasps that its current state is S1.
  3. From Table 1, the action to advance to "↓" with the highest Q value is selected.
  4. Since the transition destination becomes S6, a value of 0 is received as a reward.
  5. Q value is updated based on the reward.

Q-learning-eq2
Q-table-after

By repeating the above 1 ~ 5, the Q value table approaches the optimum one.

<<Back                 Next>>