Machine Learning and Deep Learning: Simulation with Python – qlearning.py


Code from Chapter 2 of “Machine Learning and Deep Learning: Simulation with Python” has been rewritten for clarity.

Environment

  • Python 3.6.5 Anaconda

Problem Setup

This is the learning program for maze navigation in section 2.2.3.

Similar to the book, we store Q-values in a list (q_value), which includes 1 value for the first level \((=2^0)\), 2 values for the second level \((=2^1)\), 4 values for the third level \((=2^2)\), and 8 values for the fourth level \((=2^3)\), totaling 15 values in a single list.

List Index Configuration
Level 1 (Initial State) 0
Level 2 1 2
Level 3 3 4 5 6
Level 4 7 8 9 10 11 12 13 14

Additionally, as in the book, we set the index 14 as the goal. This time, we store the index of the goal in a constant variable called GOAL_POSITION. It’s also possible to experiment with different goal positions by changing this value.

Code

We’ve made some changes to variable names and conditional statements to make it more understandable. While it’s possible to make the code more flexible for different hierarchical structures, we haven’t done that to avoid increasing the code complexity.

We’ve introduced a constant variable THRESHOLD. If the Q-values don’t change for this number of iterations, we consider the learning to have converged. We then output the iteration index at which it first reached the convergence value. The number of iterations to converge depends on the values of ALPHA, GAMMA, and EPSILON.