Proceedings of the
9th International Conference of Asian Society for Precision Engineering and Nanotechnology (ASPEN2022)
15 – 18 November 2022, Singapore

Investigating Quantum Reinforcement Learning structure to the CartPole control task

Nguyen Truong Thu Ngo1,a, Tien-Fu Lu1, James Quach2 and Peter Bruza3

1School of Mechanical Engineering, Faculty of Sciences, Engineering and Technology, The University of Adelaide, SA 5000, Australia

2School of Physical Science, Faculty of Sciences, Engineering and Technology, The University of Adelaide, SA 5000, Australia

3School of Information Systems, Faculty of Science, Queensland University of Technology, QLD 4059, Australia


In recent years, reinforcement learning (RL) has been proven to effectively provide solution to complex problems in various engineering problems such as self-driving cars, industry automation in production lines. These applications also extend to other fields including providing a platform for financial trading and healthcare. Reinforcement learning comes with the trade-offs between exploitation and exploration regardless of applied techniques and algorithms. In more complex engineering system, which often consists of large action and state-space. This space can increase exponentially, and most RL techniques fail to efficiently compute optimal policies to these problems. The inefficiency in RL models leads to the extensive requirement of computational process, which does not often come cheap and affordable. To tackle this challenge, quantum computational models were studied and achieved different levels of success. Since its first theory in the 1940s, quantum algorithm has been improved and advanced to provide exponential speed up compared to classical solutions. Quantum computation offers great potential improvements to traditional RL models due to its ability to create superpositions and entanglement. In past research, quantum variational circuits (QVC) were created as alternatives to neural networks commonly used in RL and experimented on several RL benchmarks in OpenAI Gym environments: CartPole, Acrobat and Lunar Landing. One of such QVCs were designed and tested to achieve greater efficiency in learning speed while offer reduced number of trainable parameter than classical RL. Our research aims to investigate the QVC to balance the CartPole problem running on both local PC simulator and quantum computer. The research signifies the first time a quantum RL could learn to obtain the optimal policy to acquire the maximum expected reward to a control problem and effectively apply the trainable parameter to balance the pole. The system is first tailored into a RL environment with state space and action space. For each time step, the input data from the environment (state space) is encoded into quantum states. Through the QVC, the algorithm learns to optimize the RL policy to calculate the probability of future action to obtain the optimal reward for each state-action pair. The trainable parameters from the QVC are again optimized using gradient descent. The QVC design is vital to the success of the RL model; therefore, we vary the QVC gates and investigate the model performance. It is expected that the Quantum RL model would outperform the classical RL in learning the optimal policy in term of speed and computational resources. The successful control of the pole on a quantum simulation would be proof that quantum models could offer real solution to future computation-intensive problems where classical solutions are unaffordable.

Keywords: Reinforcement learning, Quantum, Quantum Variational Circuit, CartPole.

PDF Download