| Japanese | English |
24th LSI Design Contests-in Okinawa Design Specification - 3-1
3-1. The sequence of reinforcement learning
In reinforcement learning, learning is carried out by utilizing four elements of "Environment", "Agent", "Action", and "Rewards". br> br> Environment: Environments with agents br> Agent: Acting entities br> Action: The agent's behavior br> Rewards: A reward for an action br> br> The sequence of reinforcement learning is shown in Fig. 1 below. br>
Fig 1 : The sequence of reinforcement learning
div>
- For each time step t, the agent observes the current state from the environment. li>
- The agent chooses the (have the highest Q value) action with the highest value among the actions that the agent can take in the present state. li>
- The agent receives a reward for the good or bad results of the action. li>
- The Q value (an indicator of the value of an action) in the state st, action at is updated. li> ol>
Copyright (C) 2020-2021 LSI Design Contest. All Rights Reserved.