24th LSI Design Contests-in Okinawa  Design Specification - 3-1

3-1. The sequence of reinforcement learning

In reinforcement learning, learning is carried out by utilizing four elements of "Environment", "Agent", "Action", and "Rewards". Environment: Environments with agents Agent: Acting entities Action: The agent's behavior Rewards: A reward for an action The sequence of reinforcement learning is shown in Fig. 1 below.

Q-learning 1

Fig 1 : The sequence of reinforcement learning



  1. For each time step t, the agent observes the current state from the environment.
  2. The agent chooses the (have the highest Q value) action with the highest value among the actions that the agent can take in the present state.
  3. The agent receives a reward for the good or bad results of the action.
  4. The Q value (an indicator of the value of an action) in the state st, action at is updated.
    <<Back                 Next>>