25th LSI Design Contests-in Okinawa  Design Specification - 3-1

3-1. The sequence of deep reinforcement learning

The basic learning method is the same as Q-learning, and learning is performed using the four elements of gEnvironmentg, gAgentg, "Actiong, and gRewardg. Environment: Environments with agents Agent: Acting entities Action: The agent's behavior Rewards: A reward for an action The series of flow and image diagram of deep reinforcement learning are shown in Fig. 1 and Fig. 2 below.

Q-learning 1

Fig 1 : The sequence of deep reinforcement learning


Q-learning 1

Fig 2 : The image of deep reinforcement learning



Here, an example using a neural network (NN) as a deep learning algorithm will be explained.
Basically, learning proceeds by repeating the following steps.

  1. At each time step t, the agent observes the current state from the environment and gives it as an input to NN.
  2. The agent decides the action by referring to the output of NN for the input obtained in "1".
  3. The agent receives the good or bad result of the action as a reward.
  4. Create an error function based on the reward and update the weight / bias of NN.
    <<Back                 Next>>