Campbell, M., Honey, A.J. Jr. & Hsu, F.H. Dark blue. Artif. Intel. 134, 57–83 (2002).
Silver, d. And others. Mastering the Go game with deep neural networks and tree search. Natural 529, 484–489 (2016).
Bellemare, M.G., Nadaf, Y., Venus, J. & Bowling, m. Arcade Learning Context: An Appraisal Site for Public Agent. J. Artif. Intel. Res. 47, 253–279 (2013).
Machado, M. And others. Reviewing the Arcade Learning Context: Evaluation Protocols and Open Issues for Public Agents. J. Artif. Intel. Res. 61, 523–562 (2018).
Silver, d. And others. A general reinforcement learning algorithm that goes through masters chess, showy and self-playing. Science 362, 1140–1144 (2018).
Schaefer, J .; And others. World Championship Caliber Checkers Project. Artif. Intel. 53, 273–289 (1992).
Brown, n. & Sandhome, d. Superhuman AI Science 359, 418–424 (2018).
Moravak, M. And others. Deepstock: Expert-level artificial intelligence in heads-up no-limit poker. Science 356, 508–513 (2017).
Vilavas, i. & Refinitis, i. Planning and planning Technical Report (EETN, 2013).
Seckler, M.H., Bruce, M. & Waller, M.P. Plans chemistry packages with deep neural networks and coded AI. Natural 555, 604–610 (2018).
Sutton, RS & Bardot, A.G. Reinforcement Learning: An Introduction 2nd Edition (MIT Press, 2018).
DeSenroth, M. & Rasmussen, c. Bilco: Model-based and data-efficient approach to policy search. In Brock. 28th International Conference on Machine Learning, ICML 2011 465–472 (Omnipress, 2011).
Heiss, n. And others. Learning continuous control principles with random value gradients. In NIPS’15: Proc. 28th International Conference on Neural Information Processing Systems Plane. 2 (eds Cortes, C. et al.) 2944–2952 (MIT Press, 2015).
Levine, s. & Appeal, b. Learning neural network principles with guided principle search under unknown dynamics. Adv. Neurology Inf. Process. Cyst. 27, 1071-1079 (2014).
Hoffner, D. And others. Learning the hidden dynamics of planning from pixels. Print in https://arxiv.org/abs/1811.04551 (2018).
Kaiser, L. And others. Model-based reinforcement learning for Atari. Print in https://arxiv.org/abs/1903.00374 (2019).
Fusing, L. And others. Rapid production models for reinforcement learning Learning and querying. Print in https://arxiv.org/abs/1802.03006 (2018).
Espeholt, L. And others. Impala: Measurable deep RL with significant weighted cast-learning structures. In Brock. International Conference on Machine Learning, ICML Volume. 80 (eds Dy, J. & Krause, A.) 1407–1416 (2018).
Kapturovsky, S., Astrovsky, G., Daphne, W., Kuan, J. & Munos, r. Continuous experiential repetition of distributed reinforcement learning. In International Conference on Learning Representatives (2019).
Horgan, D. And others. Reprinted Distributed Priority Experience. In International Conference on Learning Representatives (2018).
Butterman, M.L. Markov End Processes: Distinctive random dynamic programming 1st edition (John Wiley & Sons, 1994).
Coolom, r. Efficient selection and backup operators in Monte-Carlo wood search. In International Conference on Computers and Games 72–83 (Springer, 2006).
Wallstrom, N., Shawn, D.P. & Disenroth, M.P. Print in http://arxiv.org/abs/1502.02251 (2015).
Water, M., Springenberg, J.D., Poidecker, J.J. & Wrightmiller, M. Control Embed: Internally linear latent dynamics model to control from source images. In NIPS’15: Proc. 28th International Conference on Neural Information Processing Systems Plane. 2 (eds Cortes, C. et al.) 2746-2754 (MIT Press, 2015).
Ha, D. & Schmidhuber, J. Continuing global models facilitate policy evolution. In NIPS’18: Proc. 32nd International Conference on Neural Information Processing Systems (eds Bengio, S. et al.) 2455–2467 (Quran Associates, 2018).
Zelata, C., Kumar, S., Buckman, J., Nacham, O.. & Bellemare, M.G. Brock. 36th International Conference on Machine Learning: Brock Volume 97. Machine Learning Research (Editions Sadh Duri, Q.
Von Hazelt, H., Hazel, M. & Aslanites, J. When should parameter models be used in reinforcement learning? Print in https://arxiv.org/abs/1906.05243 (2019).
Tamar, A., Wu, Y., Thomas, G., Levine, S. & Appeal, b. Value repeat networks. Adv. Neurology Inf. Process. Cyst. 29, 2154–2162 (2016).
Silver, d. And others. Forecast: Final to End Learning and Planning. In Brock. 34th International Conference on Machine Learning Plane. 70 (eds Precup, D. & Teh, YW) 3191–3199 (JMLR, 2017).
Fahrmond, A.M., Barreto, A.J. & Nikovsky, D. Value-awareness loss function for model-based reinforcement learning. In Brock. 20th International Conference on Artificial Intelligence and Statistics: Brock Volume 54. Machine Learning Research (eds Singh, A. & Ju, J) 1486–1494 (PMLR, 2017).
Fahrmond, A. Reusable value-awareness model learning. Adv. Neurology Inf. Process. Cyst. 31, 9090–9101 (2018).
Farquhar, G., Rocktashel, D., Igl, M. & Whitson, s. In International Conference on Learning Representatives (2018).
O, J., Singh, S. & Lee, H. Valuation Network. Adv. Neurology Inf. Process. Cyst. 30, 6118–6128 (2017).
Khrushchevsky, A., Sutkevar, I. & Hinton, G.E. Imagenerate classification of deep convolutional neural networks. Adv. Neurology Inf. Process. Cyst. 25, 1097-1105 (2012).
He, K., Zhang, X., Ren, S. & Sun, J. Identity maps in deep residual networks. In 14th European Conference on Computer Vision 630–645 (2016).
Hazel, M. And others. Rainbow: Combining improvements in deep reinforcement learning. In Thirty-second AAAI Conference on Artificial Intelligence (2018).
Schmidt, S., Hazel, M. & Simonyan, k. With off-policy actor-reviewer shared experiential reprint. Print in https://arxiv.org/abs/1909.11583 (2019).
Azizadenaceli, K.. And others. Surprising negative results for generating negative tree search. Print in http://arxiv.org/abs/1806.05780 (2018).
Mini, V. And others. Human-level control through deep reinforcement learning. Natural 518, 529–533 (2015).
Open, for AI OpenAI. OpenAI https://blog.openai.com/openai-five/ (2018).
Vinyals, o. And others. Grandmaster level at StarCraft II using multi-agent reinforcement learning. Natural 575, 350–354 (2019).
Jotherberg, M. And others. Reinforcement learning with unsupervised ancillary tasks. Print in https://arxiv.org/abs/1611.05397 (2016).
Silver, d. And others. Mastering the Go game without human knowledge. Natural 550, 354–359 (2017).
Coxsis, L. & Sebeswari, c. Plunder-based Monte-Carlo planning. Tendon European Conference on Machine Learning 282-293 (Springer, 2006).
Rosin, CD with multi-armed bandits episode context. Ann. Mathematics. Artif. Intel. 61, 203-230 (2011).
Shot, M.P., Vinands, M.H., van den Herrick, H.J., Chaslot, G.M.-P. & Uterwiz, JW Single Player Monte-Carlo Tree Search. In International Conference on Computers and Games 1–12 (Springer, 2008).
Bohlan, D. And others. Take a look and see more: Achieving consistent performance on Atari. Print in https://arxiv.org/abs/1805.11593 (2018).
Shawl, D., Quan, J., Antonoglo, I. & Silver, d. Prioritized experiential reprint. In International Conference on Learning Representatives (2016).
Cloud TPU. Google Cloud https://cloud.google.com/tpu/ (2019).
Coolom, r. Full Historical Assessment: The Bayesian Appraisal System for Players of Different Strengths. In International Conference on Computers and Games 113-124 (2008).
Nair, A. And others. Massive parallel methods for deep reinforcement learning. Print in https://arxiv.org/abs/1507.04296 (2015).
Lancott, m. And others. OpenSpeel: A framework for reinforcement learning in sports. Print in http://arxiv.org/abs/1908.09453 (2019).