| Japanese | English |
25th LSI Design Contests-in Okinawa Design Specification - 3-1
3-1. The sequence of deep reinforcement learning
The basic learning method is the same as Q-learning, and learning is performed using the four elements of gEnvironmentg, gAgentg, "Actiong, and gRewardg. br> br> Environment: Environments with agents br> Agent: Acting entities br> Action: The agent's behavior br> Rewards: A reward for an action br> br> The series of flow and image diagram of deep reinforcement learning are shown in Fig. 1 and Fig. 2 below. br>
Fig 1 : The sequence of deep reinforcement learning
Fig 2 : The image of deep reinforcement learning
div>
Here, an example using a neural network (NN) as a deep learning algorithm will be explained.
Basically, learning proceeds by repeating the following steps.
- At each time step t, the agent observes the current state from the environment and gives it as an input to NN. li>
- The agent decides the action by referring to the output of NN for the input obtained in "1". li>
- The agent receives the good or bad result of the action as a reward. li>
- Create an error function based on the reward and update the weight / bias of NN. li> ol>