Design Specification

1. Purpose
2. Design enviroment
3. Deep reinforcement learning
3-1.The sequence of deep reinforcement learning
4. Maze exploration using deep reinforcement learning
4-1.Learning with DQN
5. Challenge

25th LSI Design Contests-in Okinawa Design Specification - 3-1

3-1. The sequence of deep reinforcement learning

The basic learning method is the same as Q-learning, and learning is performed using the four elements of “Environment“, “Agent“, "Action“, and “Reward“. Environment: Environments with agents Agent: Acting entities Action: The agent's behavior Rewards: A reward for an action The series of flow and image diagram of deep reinforcement learning are shown in Fig. 1 and Fig. 2 below.

Fig 1 : The sequence of deep reinforcement learning

Fig 2 : The image of deep reinforcement learning

Here, an example using a neural network (NN) as a deep learning algorithm will be explained.
Basically, learning proceeds by repeating the following steps.

At each time step t, the agent observes the current state from the environment and gives it as an input to NN.
The agent decides the action by referring to the output of NN for the input obtained in "1".
The agent receives the good or bad result of the action as a reward.
Create an error function based on the reward and update the weight / bias of NN.

<<Back Next>>

Contents

Design Specification

25th LSI Design Contests-in Okinawa Design Specification - 3-1

3-1. The sequence of deep reinforcement learning