Multi-Agent Reinforcement Learning using the Collective Intrinsic Motivation

Authors: Bolshakov V.E., Sakulin S.A. , Alfimtsev A.N.  Published: 15.01.2024
Published in issue: #4(145)/2023  
DOI: 10.18698/0236-3933-2023-4-61-84

Category: Informatics, Computer Engineering and Control | Chapter: Mathematical Support and Software for Computers, Computer Complexes and Networks  
Keywords: multi-agent reinforcement learning, deep learning, intrinsic reward


One of the serious problems facing the reinforcement learning is infrequency in the environment rewards. To solve this problem, effective methods for studying the environment are required. Using the intrinsic motivation principle is one of the approaches to create such research methods. Most real-world problems are characterized by only the infrequent rewards; however, there are additionally multi-agent environments, where the conventional methods of intrinsic motivation are not providing satisfactory results. Currently, applied problems are in demand at the intersection of these two problems, i.e., multi-agent environments with infrequent rewards. To solve such problems, the CIMA (Collective Intrinsic Motivation of Agents) method is proposed combining the multi-agent learning algorithms with the internal motivation models and using both external reward from the environment and the internal collective reward from the cooperative multi-agent system. Moreover, the CIMA method is able to use any neural network multi-agent learning algorithm as the basic reinforcement learning algorithm. Experiments were carried out in a specially prepared multi-agent environment with the infrequent rewards based on SMAC; the proposed method efficiency was justified by results of the comparative analysis with the modern methods of multi-agent internal motivation

The research carried out by Sakulin S.A. and Alfimtsev A.N. was supported by the RSF grant no. 22-21-00711

Please cite this article in English as:

Bolshakov V.E., Sakulin S.A., Alfimtsev A.N. Multi-agent reinforcement learning using the collective intrinsic motivation. Herald of the Bauman Moscow State Technical University, Series Instrument Engineering, 2023, no. 4 (145), pp. 61--84 (in Russ.). DOI: https://doi.org/10.18698/0236-3933-2023-4-61-84


[1] Singh S., Lewis R.L., Barto A.G., et al. Intrinsically motivated reinforcement learning: an evolutionary perspective. IEEE Trans. Auton. Mental Develop., 2010, vol. 2, no. 2, pp. 70--82. DOI: https://doi.org/10.1109/TAMD.2010.2051031

[2] Mnih V., Kavukcuoglu K., Silver D., et al. Human-level control through deep reinforcement learning. Nature, 2015, vol. 518, no. 7540, art. 529. DOI: https://doi.org/10.1038/nature14236

[3] Silver D., Huang A., Maddison C., et al. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, vol. 529, pp. 484--489. DOI: https://doi.org/10.1038/nature16961

[4] Vinyals O., Babuschkin I., Czarnecki W.M., et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 2019, vol. 575, pp. 350--354. DOI: https://doi.org/10.1038/s41586-019-1724-z

[5] El-Sallab A.A., Abdou M., Perot E., et al. Deep reinforcement learning framework for autonomous driving. arXiv:1704.02532. DOI: https://doi.org/10.48550/arXiv.1704.02532

[6] Yang Y. Many-agent reinforcement learning. London, University College, 2021.

[7] Wiering M. Multi-agent reinforcement learning for traffic light control. ICML, 2000, pp. 1151--1158.

[8] Zheng L., Cheng J., Wang J., et al. Episodic multi-agent reinforcement learning with curiosity-driven exploration. NeurIPS, 2021, vol. 1, pp. 3757--3769.

[9] Bellemare M.G., Naddaf Y., Veness J., et al. The arcade learning environment: an evaluation platform for general agents (extended abstract). IJCAI, 2015, pp. 4148--4152.

[10] Arthur A., Matignon L., Hassas S. A survey on intrinsic motivation in reinforcement learning. arXiv:1908.06976. DOI: https://doi.org/10.48550/arXiv.1908.06976

[11] Samvelyan M., Rashid T., de Witt C.S., et al. The starcraft multi-agent challenge. arXiv:1902.04043. DOI: https://doi.org/10.48550/arXiv.1902.04043

[12] Efroni Y., Mannor S., Pirotta M. Exploration-exploitation in constrained MDPs. arXiv:2003.02189. DOI: https://doi.org/10.48550/arXiv.2003.02189

[13] Jiang J., Lu Z. The emergence of individuality. PMLR, 2021, vol. 139, pp. 4992--5001.

[14] Martin J., Sasikumar S.N., Everitt T., et al. Count-based exploration in feature space for reinforcement learning. IJCAI, 2017, pp. 2471--2478. DOI: https://doi.org/10.24963/ijcai.2017/344

[15] Burda Y., Edwards H., Storkey A., et al. Exploration by random network distillation. arXiv:1810.12894. DOI: https://doi.org/10.48550/arXiv.1810.12894

[16] Machado M.C., Bellemare M.G., Bowling M. Count-based exploration with the successor representation. arXiv:1807.11622. DOI: https://doi.org/10.48550/arXiv.1807.11622

[17] Tang H., Houthooft R., Foote D., et al. Exploration: a study of count-based exploration for deep reinforcement learning. NIPS, 2017, vol. 1, pp. 2754--2763.

[18] Charoenpitaks K., Limpiyakorn Y. Multi-agent reinforcement learning with clipping intrinsic motivation. Int. J. Mach. Learn., 2022, vol. 12, no. 3, pp. 85--90. DOI: https://doi.org/10.18178/ijmlc.2022.12.3.1084

[19] Oh J., Guo X., Lee H., et al. Actionconditional video prediction using deep networks in Atari games. NIPS, 2015, pp. 2863--2871.

[20] Savinov N., Raichuk A., Marinier R., et al. Episodic curiosity through reachability. arXiv:1810.02274. DOI: https://doi.org/10.48550/arXiv.1810.02274

[21] Fu J., Co-Reyes J., Levine S. Ex2: exploration with exemplar models for deep reinforcement learning. NIPS, 2017, pp. 2577--2587.

[22] Kim Y., Nam W., Kim H., et al. Curiosity-bottleneck: exploration by distilling task-specific novelty. PMLR, 2019, pp. 3379--3388.

[23] Kim H., Kim J., Jeong Y., et al. EMI: exploration with mutual information. ICML, 2019, vol. 97, pp. 5837--5851.

[24] Pathak D., Agrawal P., Efros A.G., et al. Curiosity-driven exploration by self-supervised prediction. IEEE CVPRW, 2017, pp. 488--489. DOI: https://doi.org/10.1109/CVPRW.2017.70

[25] Du Y., Han L., Fang M., et al. Liir: learning individual intrinsic reward in multi-agent reinforcement learning. NIPS, 2019, pp. 4403--4414.

[26] Amato C., Konidaris G.D., Cruz G., et al. Planning for decentralized control of multiple robots under uncertainty. IEEE ICRA, 2015, pp. 1241--1248. DOI: https://doi.org/10.1109/ICRA.2015.7139350

[27] Bellemare M., Srinivasan S., Ostrovski G., et al. Unifying count-based exploration and intrinsic motivation. NIPS, 2016, pp. 1471--1479.

[28] Ostrovski G., Bellemare M.G., van den Oord A., et al. Countbased exploration with neural density models. ICML, 2017, pp. 2721--2730.

[29] Klissarov M., Islam R., Khetarpal K., et al. Variational state encoding as intrinsic motivation in reinforcement learning. ICLR, 2019, pp. 2--7.

[30] Stadie B.C., Levine S., Abbeel P. Incentivizing exploration in reinforcement learning with deep predictive models. arXiv:1507.00814. DOI: https://doi.org/10.48550/arXiv.1507.00814

[31] Sutton R.S., Barto A.G. Reinforcement learning. An introduction. Cambridge, MIT Press, 2018.

[32] Lillicrap T.P., Hunt J.J., Pritzel A., et al. Continuous control with deep reinforcement learning. arXiv:1509.02971. DOI: https://doi.org/10.48550/arXiv.1509.02971

[33] Lowe R., Wu Y., Tamar A., et al. Multi-agent actor-critic for mixed cooperative-competitive environments. NIPS, 2017, pp. 6382--6393.

[34] Kingma D.P., Welling M. Auto-encoding variational bayes. arXiv:1312.6114. DOI: https://doi.org/10.48550/arXiv.1312.6114