# represent state of the board as a string # each string is a box # the menace system will be a dictionary # pick arbitrary # store all moves in history and reward at the end of each game