Efficiency Analysis of Quantization Methods for Optimizing Machine Learning on Microcontrollers for Resource-Limited Embedded Systems
| Authors: Achkasov A.V., Yagodkin A.S., Makarenko F.V. | Published: 15.04.2026 |
| Published in issue: #1(154)/2026 | |
| DOI: | |
| Category: Informatics, Computer Engineering and Control | Chapter: System Analysis, Control, and Information Processing | |
| Keywords: | |
Abstract
The article presents a comprehensive analysis of quantization methods aimed at optimizing Machine Learning models for deployment in the context of limited TinyML resources. The study covers various quantization schemes, including uniform, logarithmic, and trained quantization, and evaluates their impact on the performance of popular neural network architectures such as MobileNetV1/V2, ResNet-50, ShuffleNetV2, and Mamba. The experimental results show that switching from 32-bit floating-point numbers to 8-bit integer representations allows to reduce the size of models by 4 times, while the loss of accuracy is less than 2 %. Hybrid mixed-precision schemes demonstrate an optimal balance between the degree of compression and the preservation of accuracy. Measurements carried out on the STM32U5 platform confirm a significant reduction in power consumption --- by 4.3 times when using 8-bit quantization. The article offers practical recommendations for choosing optimal quantization schemes depending on hardware limitations and the specifics of the problem being solved. Promising areas for further research are outlined, in particular, the integration of reinforcement learning algorithms for dynamic selection of bit depth and the development of hardware-software co-optimization methods for domestic microcontrollers, such as K1879VG1T
Please cite this article in English as:
Achkasov A.V., Yagodkin A.S., Makarenko F.V. Efficiency analysis of quantization methods for optimizing Machine Learning on microcontrollers for resource-limited embedded systems. Herald of the Bauman Moscow State Technical University, Series Instrument Engineering, 2026, no. 1 (154), pp. 59--79 (in Russ.). EDN: EYNMRJ
References
[1] Howard A.G., Zhu M., Chen B., et al. MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861. DOI: https://doi.org/10.48550/arXiv.1704.04861
[2] Alajlan N.N., Ibrahim D.M. TinyML: enabling of inference deep learning models on ultra-low-power IoT edge devices for AI applications. Micromachines, 2022, vol. 13, no. 6, art. 851. DOI: https://doi.org/10.3390/mi13060851
[3] Suwannaphong T., Jovan F., Craddock I., et al. Optimising TinyML with quantization and distillation of transformer and mamba models for indoor localisation on edge devices. Sci. Rep., 2025, vol. 15, art. 10081. DOI: https://doi.org/10.1038/s41598-025-94205-9
[4] Liberis E., Dudziak L., Lane N.D. μNAS: constrained neural architecture search for microcontrollers. Proc. 1st Workshop on Machine Learning and Systems, 2020, pp. 70--79. DOI: https://doi.org/10.1145/3437984.3458836
[5] Finkelstein A., Fuchs E., Tal I., et al. QFT: Post-training quantization via fast joint finetuning of all degrees of freedom. In: Computer Vision -- ECCV 2022 Workshops. Springer, 2023, pp. 115--129. DOI: https://doi.org/10.1007/978-3-031-25082-8_8
[6] Partha Pratim Ray. A review on TinyML: state-of-the-art and prospects. J. King Saud Univ. --- Comput. Inf. Sci., 2022, vol. 34, no. 4, pp. 1595--1623. DOI: https://doi.org/10.1016/j.jksuci.2021.11.019
[7] Flores T.K., Medeiros M., Silva M., et al. Enhanced vector quantization for embedded machine learning: a post-training approach with incremental clustering. IEEE Access, 2025, vol. 13, pp. 17440--17456. DOI: https://doi.org/10.1109/ACCESS.2025.3532849
[8] Xiang D., Liu T. Monolayer transistors at wafer scales. Nature Electronics, 2021, vol. 4, no. 12, pp. 914--923. DOI: https://doi.org/10.1038/s41928-021-00694-7
[9] Banbury C., Reddi V.J., Torelli P., et al. MLPerf tiny benchmark. arXiv:2106.07597. DOI: https://doi.org/10.48550/arXiv.2106.07597
[10] Kolesnikov M.I., Kharchenko M.E., Dorokhov V.A., et al. Application of semiconductor electronics products in extreme conditions. Modelirovanie sistem i protsessov [Modeling of Systems and Processes], 2023, vol. 16, no. 1, pp. 46--56 (in Russ.). DOI: https://doi.org/10.12737/2219-0767-2023-16-1-46-56
[11] Sakthi M., Yadla N., Pawate R. Deep learning model compression using network sensitivity and gradients. arXiv:2210.05111. DOI: https://doi.org/10.48550/arXiv.2210.05111
[12] Chen S., Wang W., Pan S.J. Deep neural network quantization via layer-wise optimization using limited training data. Proc. AAAI Conf. on Artificial Intelligence, 2019, vol. 33, no. 1, pp. 3329--3336. DOI: https://doi.org/10.1609/aaai.v33i01.33013329
[13] Wei L., Ma Z., Yang C., et al. Advances in the neural network quantization: a comprehensive review. Appl. Sci., 2024, vol. 14, no. 17, art. 7445. DOI: https://doi.org/10.3390/app14177445
[14] Banner R., Nahshan Y., Soudry D. Post training 4-bit quantization of convolutional networks for rapid-deployment. NeurIPS, 2019.
[15] Kallimani R., Pai K., Raghuwanshi P., et al. TinyML: tools, applications, challenges, and future research directions. Multimed. Tools Appl., 2024, vol. 83, no. 10, pp. 29015--29045. DOI: https://doi.org/10.1007/s11042-023-16740-9
[16] Ray P.P. A review on TinyML: state-of-the-art and prospects. J. King Saud Univ. --- Comput. Inf. Sci., 2022, 2022, vol. 34, no. 4, pp. 1595--1623. DOI: https://doi.org/10.1016/j.jksuci.2021.11.019
[17] Elhanashi A., Dini P., Saponara S., et al. Advancements in TinyML: applications, limitations, and impact on IoT devices. Electronics, 2024, vol. 13, no. 17, art. 3562. DOI: https://doi.org/10.3390/electronics13173562
[18] Alajlan N.N., Ibrahim D.M. TinyML: enabling of inference deep learning models on ultra-low-power IoT Edge Devices for AI applications. Micromachines, 2022, vol. 13, no. 6, art. 851. DOI: https://doi.org//10.3390/mi13060851
[19] Lin J., Zhu L., Chen W., et al. Tiny machine learning: progress and futures [feature]. IEEE Circuits Syst. Mag., 2023, vol. 23, pp. 8--34. DOI: https://doi.org/10.1109/MCAS.2023.3302182
[20] Gu А., Dao T. Mamba: linear-time sequence modeling with selective state spaces. arXiv:2312.00752. DOI: https://doi.org/10.48550/arXiv.2312.00752
[21] Capogrosso L., Cunico F., Cheng D.S., et al. A machine learning-oriented survey on tiny machine learning. IEEE Access, 2024, vol. 12, рp. 23406--23426. DOI: https://doi.org/10.1109/ACCESS.2024.3365349
[22] Ren H., Anicic D., Runkler T.A. TinyOL: TinyML with online-learning on microcontrollers. IJCNN, 2021. DOI: https://doi.org/10.1109/IJCNN52387.2021.9533927
[23] Yagodkin A.S., Zolnikov V.K., Skvortsova T.V., et al. Development of algorithms and programs for the analysis of electrical characteristics BIS. Modelirovanie sistem i protsessov [Modeling of Systems and Processes], 2022, vol. 15, no. 3, pp. 136--148 (in Russ.). DOI: https://doi.org/10.12737/2219-0767-2022-15-4-136-148
[24] Kolesnikov M.I., Kharchenko M.E., Dorokhov V.A., et al. Application of semiconductor electronics products in extreme conditions. Modelirovanie sistem i protsessov [Modeling of Systems and Processes], 2023, vol. 16, no. 1, pp. 46--56 (in Russ.). DOI: https://doi.org/10.12737/2219-0767-2023-16-1-46-56
[25] Low S.M., Kumar A., Sanner S. (2022). Sample-efficient iterative lower bound optimization of deep reactive policies for planning in continuous MDPs. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, vol. 36, no. 9, pp. 9840--9848. DOI: https://doi.org/10.1609/aaai.v36i9.21220
[26] Achkasov A. Chiplets and heterogeneous integration as a basic technology stack capable of ensuring the sovereignty of domestic electronics in a new technological order. Elektronika: nauka, tekhnologiya, biznes [Electronics: Science, Technology, Business], 2023, no. 8, pp. 114--123 (in Russ.). DOI: https://doi.org/10.22184/1992-4178.2023.229.8.114.123
