Main Catalog Informatics, Computer Engineering and Control Mathematical Support and Software for Computers, Computer Complexes and Networks

Knowledge Transfer for LLM-Based Machine Learning Algorithms in Multi-Agent Systems

Authors: Morozov K.A., Alfimtsev A.N.	Published: 15.04.2026
Published in issue: #1(154)/2026
DOI:
Category: Informatics, Computer Engineering and Control \| Chapter: Mathematical Support and Software for Computers, Computer Complexes and Networks
Keywords: multi-agent reinforcement learning, large language models, machine learning, reasoning

Abstract

The ability of large language models to cope with intellectual tasks is an extremely important skill in a variety of environments that require making decisions based on publicly available information. In reinforcement learning, and especially in multi-agent learning, regardless of the overall complexity of the environment, it is extremely important to achieve significant results based on essentially simple actions that could seem impossible in retrospect. This article considers the possibility of using the large language model (LLM) Mistral-7B Instruct-v0.3 for application in the problem of multi-agent reinforcement learning. A method for interaction with LLM is developed in order to use the reasoning of the large language model for the problem of planning and distributing actions. An assessment of the reflection of the large language model as a result of the actions it designated as necessary to achieve the goal set in the environment is carried out. The implemented knowledge transfer from LLM allows using successful approaches for multi-agent reinforcement learning problems in a grid-world environment. An experimental comparison of machine learning algorithms that can effectively interact with the information provided to them, obtained as a result of interaction with a large language model, is carried out. The proposed method allows embedding the LLM reasoning structure into the learning of a multi-agent system

This work was carried out within the State Assignment (no. FSFN-2024-0059)

Please cite this article in English as:

Morozov K.A., Alfimtsev A.N. Knowledge transfer for LLM-based Machine Learning algorithms in multi-agent systems. Herald of the Bauman Moscow State Technical University, Series Instrument Engineering, 2026, no. 1 (154), pp. 80--95 (in Russ.). EDN: FIHEJC

References

[1] Morozov K.A. [Balance between using a large language model and reinforcement learning]. Nauka, tekhnologii i biznes. VI Mezhvuz. konf. aspirantov, soiskateley i molodykh uchenykh [Science, Engineering and Business, VI Interacademic Conf. for Graduate Students and Young Researchers]. Moscow, BMSTU Publ., 2024, pp. 328--334 (in Russ.).

[2] Jiang A.Q., Sablayrolles A., Mensch A., et al. Mistral 7B. arXiv:2310.06825. DOI: https://doi.org/10.48550/arXiv.2310.06825

[3] Morozov K.A. Features of reinforcement learning algorithm in multi-agent environments based on neural networks of transformers. IIASU’23. Sb. st. II Vseros. nauch. konf. T. 1 [IIASU’23 -- Artificial Intelligence in Management, Control, and Data Processing Systems. Proc. II All-Russian Sci. Conf. Vol. 1]. Moscow, KDU Publ., Dobrosvet Publ., 2023, pp. 188--195 (in Russ.). DOI: https://doi.org/10.31453/kdu.ru.978-5-7913-1351-5-2023-435

[4] Velichko N.A. Distributed multi-agent reinforcement learning based on feudal networks. 6th REEPE, 2024. DOI: https://doi.org/10.1109/REEPE60449.2024.10479775

[5] Morgunov E.F., Alfimtsev A.N. The "Stag Hunt" social dilemma in multi-agent reinforcement learning. 6th REEPE, 2024. DOI: https://doi.org/10.1109/REEPE60449.2024.10479770

[6] Morozov K.A. Models as a key factor of environments design in multi-agent reinforcement learning. 6th REEPE, 2024. DOI: https://doi.org/10.1109/REEPE60449.2024.10479882

[7] Zhu Z., Lin K., Jain A.K., et al. Transfer learning in deep reinforcement learning: a survey. IEEE Trans. Pattern Anal. Mach. Intell., 2023, vol. 45, pp. 13344--13362. DOI: https://doi.org/10.1109/TPAMI.2023.3292075

[8] Kostrikov I., Nair A., Levine S. Offline reinforcement learning with implicit Q-learning. arXiv:2110.06169. DOI: https://doi.org/10.48550/arXiv.2110.06169

[9] Mnih V., Kavukcuoglu K., Silver D., et al. Human-level control through deep reinforcement learning. Nature, 2015, vol. 518, pp. 529--533. DOI: https://doi.org/10.1038/nature14236

[10] Lowe R., Wu Y., Tamar A., et al. Multi-agent actor-critic for mixed cooperative-competitive environments. arXiv:1706.02275. DOI: https://doi.org/10.48550/arXiv.1706.02275

[11] Leike J., Martic M., Krakovna V., et al. AI safety gridworlds. arXiv:1711.09883. DOI: https://doi.org/10.48550/arXiv.1711.09883

[12] Wei J., Wang X., Schuurmans D., et al. Chain-of-thought prompting elicits reasoning in large language models. arXiv:2201.11903. DOI: https://doi.org/10.48550/arXiv.2201.11903

[13] Lightman H., Kosaraju V., Burda Y., et al. Let’s verify step by step. arXiv 2305.20050. DOI: https://doi.org/10.48550/arXiv.2305.20050

[14] Huang L., Yu E., Ma W., et al. A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions. ACM Trans. Inf. Syst., 2025, no. 2, vol. 43, pp. 1--55. DOI: https://doi.org/10.1145/3703155

[15] Pawitan Y., Holmes C. Confidence in the reasoning of large language models. arXiv:2412.15296. DOI: https://doi.org/10.48550/arXiv.2412.15296

[16] Xue Y., Kudenko D., Khosla M. Graph learning-based generation of abstractions for reinforcement learning. Neural Comput. & Applic., 2025, vol. 37, no. 19, pp. 13187--13207. DOI: https://doi.org/10.1007/s00521-023-08211-x

[17] Velichko N.A., Golubev E.Zh., Morgunov E.F., et al. [Pedestrian traps as social dilemmas of a smart city and their solution by the wolf-PHC algorithm]. IIASU’22. Sb. st. Vseros. nauch. konf. T. 1 [IIASU’22 -- Artificial Intelligence in Management, Control, and Data Processing Systems. Proc. II All-Russian Sci. Conf. Vol. 1]. Moscow, BMSTU Publ., 2022, pp. 181--191 (in Russ.). EDN: HWZUKR

[18] Morgunov E.F., Alfimtsev A.N. Recognizing and solving the "Stag Hunt" social dilemma using multi-agent reinforcement learning. IIASU’23. Sb. st. II Vseros. nauch. konf. T. 1 [IIASU’23 -- Artificial Intelligence in Management, Control, and Data Processing Systems. Proc. II All-Russian Sci. Conf. Vol. 1]. Moscow, KDU Publ., Dobrosvet Publ., 2023, pp. 182--187 (in Russ.). DOI: https://doi.org/10.31453/kdu.ru.978-5-7913-1351-5-2023-435

[19] Zhang Y., Mao S., Ge T., et al. LLM as a mastermind: a survey of strategic reasoning with large language models. arXiv:2404.01230. DOI: https://doi.org/10.48550/arXiv.2404.01230

[20] Liu I.J., Jain U., Yeh R.A., et al. Cooperative exploration for multi-agent deep reinforcement learning. Proc. PMLR, 2021, vol. 139, pp. 6826--6836. URL: https://proceedings.mlr.press/v139/liu21j

[21] Alfimtsev A.N. Fuzzy aggregation of multimodal information in an intelligent interface. Programmnye produkty i sistemy [Software & Systems], 2011, no. 3, pp. 44--48 (in Russ.). EDN: OWJLVH

[22] Vidmanov D.A., Alfimtsev A.N. MARLMUI: multi-agent reinforcement learning approach in mobile adaptive user interface. 5th REEPE, 2023. DOI: https://doi.org/10.1109/REEPE57272.2023.10086785

[23] Qiu W., Wang X., Yu R., et al. RMIX: learning risk-sensitive policies for cooperative reinforcement learning agents. arXiv:2102.08159. DOI: https://doi.org/10.48550/arXiv.2102.08159