Relevant Research

Wang, Z. MooZi: A High-Performance Game-playing System that Plans with a Learned Model. (2022).

Zeyi’s master’s thesis. In his thesis, he reviewed deep reinforcement learning algorithms, planning algorithms, as well as distributed systems that revolved around training AI to make decisions in complex environments. He also implemented such a system from scratch, demonstrating his deep understanding of such algorithms and systems down to lines-of-code level. The system he built, called MooZi, uses MuZero as its core learning algorithm, and adapts to popular game interfaces such as OpenSpiel and Atari Learning Environment. Based on the experiments in the thesis, MooZi can learn to master various games from scratch with the same learning algorithm in less than two days. Zeyi passed his exam in Oct 2022 and his thesis received an award recommendation.

Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).

Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).

Silver, D. et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. Preprint at https://doi.org/10.48550/arXiv.1712.01815 (2017).

Schrittwieser, J. et al. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588, 604–609 (2020).

This is the AlphaGo line of research. AlphaGo was the first Go program that beats human champion. AlphaGo was trained on a massive amount of expert data using supervised learning then fine-tuned using deep reinforcement learning and tree search. AlphaGo Zero discovered human expert data is unnecessary for mastering Go, and it learns even better by playing against itself from scratch. Alpha Zero shows that the same core learning algorithm can be applied on other games such as Go, chess, and Shogi, beating previous state-of-the-art of all these games. MuZero, by removing the need for game rules for the AI, shows the core algorithm can be even further generalized to master a wider range of games. Zeyi’s thesis project included an implementation MuZero.

Degrave, J. et al. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature 602, 414–419 (2022).

Mandhane, A. et al. MuZero with Self-competition for Rate Control in VP9 Video Compression. Preprint at http://arxiv.org/abs/2202.06626 (2022).

Fawzi, A. et al. Discovering faster matrix multiplication algorithms with reinforcement learning. Nature 610, 47–53 (2022).

Reinforcement learning provides a framework to formulate a problem of interest into a decision-making problem like a game. In these examples, reinforcement learning was used to develop control strategies for controlling tokamak plasmas, optimizing video compression, and discovering new matrix multiplication algorithms.

In all these examples, by formulating the problem as a reinforcement learning game, the authors were able to train AI to achieve a better performance than existing methods. This demonstrates the potential of reinforcement learning to tackle real-world problems in diverse fields and analog layout could be one of such problems.

Mirhoseini, A. et al. A graph placement methodology for fast chip design. Nature 594, 207–212 (2021).

This is an application of deep reinforcement learning in the microchip industry. In this paper, the authors describe how Google used deep reinforcement learning to optimize macro placement in IC design. This approach allowed the company to produce the design for its latest generation Tensor Processing Unit (TPU) faster and with higher quality than human engineers.

This paper addresses the problem in the digital domain, while Astrus focuses on solving a similar problem in the analog domain. The digital place and route problem is already automated in existing workflows, and even if Google productize their algorithm, it will only be a 10% improvement over the existing alternative. Analog layout, however, is much harder, still completely manual, and represents over 75% of the design effort.