RL can use the Markov Decision process for define the transition probability. Star Student Project is designing the VM capacity management agent as a distributed RL agent with a highly efficient representation of the Q table. Q table is 100 times better than the normal plain look-up table.