Implementation of Google's paper on playing atari games using deep learning in python.
Paper Authors: Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller
Paper Link:
This project presents an implementation of a model (based on the above linked paper) that successfully learns control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. This model is tested on variety of Atari and custom made games and its performance is compared with human players.
- Python 2.7
- numpy
- Lasagne
- Theano
- matplotlib
- scipy
- Arcade Learning environment (for Atari games)
- pygame (for flappy bird and shooter)
- GPU with CC score of greater than or equal to 3 (refer and
The model is trained for 2 Atari games - Space Invaders and Breakout. The model was trained for about 12-13 hrs and has achieved good performace that is consistent with the paper.
To run the agent:
python AtariGame-Breakout/ --rom_file breakout.bin --play_games 10 --display_screen --load_weights breakout_models/dep-q-rmsprop-breakout99-epoch.pkl
python AtariGame-SpaceInvaders/ --rom_file space_invaders.bin --play_games 10 --display_screen --load_weights spaceinvaders_models/dep-q-rmsprop-space_invaders99-epoch.pkl
I have trained a plain vanilla Q learning (based on based agent where the agent gets information such as the x and y distance from the pipes to compare the performance of this game-specific model to a generalized model as described in the Google's paper. Training time is about 2-3 hrs.
To run the agent:
python FlappyQ/
Similar to the Atari games, I have trained the same model with minor only minor modificaions to the parameters to play Flappy Bird - although the performance is not as good as the Q learning mode which had explicit game data - it still gets a decent average score of about 20-30.
To run the agent:
python FlappyBirdDQN/ --play_games 10 --display_screen --load_weights flappy_models/dep-q-flappy-60-epoch.pkl
This is a very simple game I made using pygame where the player controls a "spaceship" is tasked to dodge the incomming "meteoroids" and stay alive as long as possible. I also tried an (silly?) experiment where I trained different models wherein each model had agents with different degrees of control over the space ship and compared the performance of the same.
To run the agent with just 2 control setting (left and right):
python ShooterDQN/ --play_games 10 --display_screen --load_weights shooter_models/dep-q-shooter-nipscuda-8movectrl-99-epoch.pkl
To run the agent with just 4 control setting (left, right, top and bottom):
python ShooterDQN/ --play_games 10 --display_screen --load_weights shooter_models/dep-q-shooter-nipscuda-4movectrl-99-epoch.pkl
To run the agent with just 8 control setting (all directions):
python ShooterDQN/ --play_games 10 --display_screen --load_weights shooter_models/dep-q-shooter-nipscuda-2movectrl-80-epoch.pkl
For all the below graphs, the X axis is the traning timeline and the Y axis the score funtion for each game.
(Note: scores in Shooter anf flappy bird have been modified (reward amplified) because the original +1 or -1 is not applicable since the player does not have "lives" here and rewards are also very sparse in the these 2 games.)
Number of epochs and train cycles has been adjusted such that all the above code when used for traning takes only about 12-15 hrs max. depending on your CPU and GPU (My CPU: i5 3.4 GHz and GPU: nVidia GeForce 660). Also, do not expect super human level performance (as said in Google's paper) from the models as I have trained it only for 12-15 hrs - more traning with further parameter tuning can improve the scores of all the above games.
The deep Q network used in this project is a modified version of spragunr's dqn code (
[1] Deep Learning in Neural Networks: An Overview
[2] The Arcade Learning Environment:
[3] ImageNet Classification with Deep Convolutional Neural Networks:
[4] Lasagne:
[5] Theano:
[6] CUDA:
[7] Pygame:
[8] General: