Reinforcement Learning Sandbox
AI Snake Lab is an interactive reinforcement learning sandbox for experimenting with AI agents in a classic Snake Game environment — featuring a live Textual TUI interface, flexible replay memory database, and modular model definitions.
Component | Description |
---|---|
Python 3.11+ | Core language |
Textual | Terminal UI framework |
SQLite3 | Lightweight replay memory + experiment store |
PyTorch (optional) | Deep learning backend for models |
Plotext / Matplotlib | Visualization tools |
The Dynamic Training checkbox implements an increasing amount of long training that is executed at the end of each epoch. The AIAgent:train_long_memory()
code is run an increasing number of times at the end of each epoch as the total number of epochs increases to a maximum of MAX_ADAPTIVE_TRAINING_LOOPS
, which is currently 16
according to the calculation shown below:
if self.dynamic_training():
loops = max(
1, min(self.epoch() // 250, DAIAgent.MAX_DYNAMIC_TRAINING_LOOPS)
)
else:
loops = 1
while loops > 0:
loops -= 1
self.train_long_memory()
The Epsilon N class, inspired by Richard Sutton’s The Bitter Lesson, is a drop-in replacement for the traditional Epsilon Greedy algorithm. It instantiates a traditional Epsilon Greedy instance for each score. For example, when the AI has a score of 7, an Epsilon Greedy instance at self._epsilons[7]
is used.
The screenshot below shows the Highscores plot with the traditional, single instance Epsilon Greedy algorithm and with Dynamic Training enabled. The single-instance epsilon algorithm quickly converges but shows uneven progress. The abrupt jumps in the high score line indicate that exploration declines before the AI fully masters each stage of play.
The next screenshot shows the Highscores plot without Dynamic Training enabled and with the Epsilon N being used. By maintaining a separate epsilon for each score level, Epsilon N sustains exploration locally. This results in a smoother, more linear high-score curve as the AI consolidates learning at each stage before progressing.
As shown above, the traditional Epsilon Greedy approach leads to slower improvement, with the highscore curve flattening early. By contrast, Epsilon N maintains steady progress as the AI masters each score level independently.
This project is on PyPI. You can install the AI Snake Lab software using pip
.
python3 -m venv snake_venv
. snake_venv/bin/activate
After you have activated your venv environment:
pip install ai-snake-lab
From within your venv environment:
ai-snake-lab
Because the simulation runs in a Textual TUI, terminal resizing and plot redraws can subtly affect the simulation timing. As a result, highscore achievements may appear at slightly different game numbers across runs. This behavior is expected and does not indicate a bug in the AI logic.
If you have a requirement to make simulation runs repeatable, drop me a note and I can implement a headless mode where the simulation runs in a separate process outside of Textual.
The original code for this project was based on a YouTube tutorial, Python + PyTorch + Pygame Reinforcement Learning – Train an AI to Play Snake by Patrick Loeber. You can access his original code here on GitHub. Thank you Patrick!!! You are amazing!!!! This project is a port of the pygame and matplotlib solution.
Thanks also go out to Will McGugan and the Textual team. Textual is an amazing framework. Talk about Rapid Application Development. Porting this from a Pygame and MatPlotLib solution to Textual took less than a day.
Creating an artificial intelligence agent, letting it loose and watching how it performs is an amazing process. It’s not unlike having children, except on a much, much, much smaller scale, at least today! Watching the AI driven Snake Game is mesmerizing. I’m constantly thinking of ways I could improve it. I credit Patrick Loeber for giving me a fun project to explore the AI space.
Much of my career has been as a Linux Systems administrator. My comfort zone is on the command line. I’ve never worked as a programmer and certainly not as a front end developer. Textual, as a framework for building rich Terminal User Interfaces is exactly my speed and when I saw Dolphie, I was blown away. Built-in, real-time plots of MySQL metrics: Amazing!
Richard S. Sutton is also an inspiration to me. His thoughts on Reinforcement Learning are a slow motion revolution. His criticisms of the existing AI landscape with it’s focus on engineering a specific AI to do a specific task and then considering the job done is spot on. His vision for an AI agent that does continuous, non-linear learning remains the next frontier on the path to General Artificial Intelligence.