Comparative Analysis of Machine Learning Models for Real-Time Object Detection

Merlin Wittenhagen

doi:10.5937/jcfs4-58032

Merlin Wittenhagen SRH University Heidelberg

DOI: https://doi.org/10.5937/jcfs4-58032

Keywords: Object Detection; Model Benchmarking; Inference Efficiency, Hardware-Aware Evaluation

Abstract

: Object detection is a fundamental task in computer vision with applications ranging from autonomous driving, industrial automation and medical imaging. This report presents a comparative analysis of six well-known object detection models, precisely three small models for edge computing and three large models likely more suited for usage on high-performance systems. The models YOLOv10-Nano, MobileNetV3-SSDLite, EfficientDet-D0, Faster R-CNN, YOLOv10-Large and DETR were evaluated and compared based on their performance in terms of inference speed, accuracy and computational efficiency. The evaluation is conducted through both literature-based benchmarks and empirical tests on two different systems: an Apple Silicon M1 Pro-based system and an NVIDIA RTX 3080Ti-powered computer. Results show that YOLOv10 models consistently outperform the other models in real-time object detection as well as achieving superior accuracy in general while maintaining significantly lower inference times. The analysis further highlights compatibility issues with certain hardware, particularly focusing on PyTorch's MPS backend on Apple Silicon, which leads to serious performance drops in some models. The findings highlight the importance of choosing the right model and appropriate hardware for specific application scenarios.

References

[1] R. Kundu, “Yolo: Algorithm for object detection explained [+examples],” V7 Labs Blog, 2023.
[2] DataCamp, “Yolo object detection explained: A beginner’s guide,” DataCamp Blog, 2024.
[3] Ultralytics, “Nicht-maximum-unterdru¨ckung (nms),” Ultralytics Glossar, 2024.
[4] Ultralytics, “Yolov10: End-to-end-objekterkennung in echtzeit,” Ultralytics YOLO Docs, 2024.
[5] H. Face, “Deformable detr: Deformable transformers for end-to-end object detection.” https:
//huggingface.co/docs/transformers/en/model_doc/deformable_detr, 2025. Accessed: 2025-03-11.
[6] A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han, and G. Ding, “Yolov10: Real-time end-toend object detection.” https://github.com/THU-MIG/yolov10/blob/main/ultralytics/ nn/modules/head.py, 2024. Accessed: 2025-03-11.

[7] d4r6j, “Yolov10: Model review.” https://velog.io/@d4r6j/YOLOv10-1.-Model-Review, 2024. Accessed: 2025-03-11.
[8] S. Ren, K. He, R. B. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” arXiv preprint arXiv:1506.01497, 2015.
[9] R. B. Girshick, “Fast r-cnn,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448, 2015.
[10] R. User, “Network structure diagram of faster r-cnn.” https://www.researchgate.net/ fig-ure/Network-structure-diagram-of-Faster-R-CNN-Faster-R-CNN-is-mainly-divided-into-the_ fig1_341871095. Accessed: 2025-03-07.
[11] A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, Q. V. Le, and H. Adam, “Searching for mobilenetv3,” arXiv preprint arXiv:1905.02244, 2019.
[12] V. Vryniotis, “Everything you need to know about torchvision’s ssdlite implementation.” https://pytorch.org/blog/torchvision-ssdlite-implementation/, 2021. Accessed: 2025-03-07.
[13] P. Team, “Ssdlite implementation in torchvision.” https://github.com/pytorch/vision/ blob/main/torchvision/models/detection/ssdlite.py, 2025. Accessed: 2025-03-07.
[14] M. Tan, R. Pang, and Q. V. Le, “Efficientdet: Scalable and efficient object detection,” in Proceed-ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10781–10790, 2020.
[15] D. Buongiorno, D. Caramia, L. D. Ruscio, and A. Brunetti, “Efficientdet-d0 architecture: Effi-cientnet-b0 as backbone network with multiple bifpn layers.” https://www.researchgate.net/figure/ EfficientDet-D0-architecture-EfficientNet-B0-34-is-the-backbone-network-multiple_ fig2_365439360, 2022. Accessed: 2025-03-07.
[16] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European Conference on Computer Vision (ECCV), pp. 213–229, 2020.
[17] H. Face, “Detr: End-to-end object detection with transformers.” https://huggingface.co/ docs/transformers/en/model_doc/detr, 2025. Accessed: 2025-03-11.
[18] G. Boesch, “Detr: End-to-end object detection with transformers.” https://viso.ai/ deep-learning/detr-end-to-end-object-detection-with-transformers/, 2024. Accessed: 2025-03-11.
[19] T.-Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, P. Doll´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context.” https: //cocodataset.org/#download, 2014. Accessed: 2025-03-07.
[20] Notebookcheck, “Apple m1 pro prozessor - benchmarks und specs,” 2021.
[21] A. Inc., “Accelerated pytorch training on mac.” https://developer.apple.com/metal/ pytorch/, 2025. Accessed: 2025-03-10.
[22] I. Corporation, “Intel® core™ i7-12700k prozessor (25 mb cache, bis zu 5,00 ghz) spezifikatio-nen.” https://www.intel.de/content/www/de/de/products/sku/134594/ intel-core-i712700k-processor-25m-cache-up-to-5-00-ghz/specifications.html, 2021. Zugegriffen: 2025-03-10.
[23] NVIDIA, “Geforce rtx 3080 and rtx 3080 ti graphics cards,” 2024.
[24] H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, “Generalized intersection over union: A metric and a loss for bounding box regression,” Proceedings of the IEEE/CVF Con-ference on Computer Vision and Pattern Recognition (CVPR), pp. 658–666, 2019.
[25] A. Rosebrock, “Intersection over union (iou) for object detection.” https://pyimagesearch. com/2016/11/07/intersection-over-union-iou-for-object-detection/, 2016. Accessed: 2025-03-10.
[26] D. Shah, “Mean average precision (map) explained: Everything you need to know.” https: //www.v7labs.com/blog/mean-average-precision, 2022. Accessed: 2025-03-10.
[27] C. Consortium, “Coco object detection evaluation.” https://cocodataset.org/ #detection-eval, 2025. Accessed: 2025-03-10.
[28] Ultralytics, “Yolov10 vs efficientdet comparison.” https://docs.ultralytics.com/ com-pare/yolov10-vs-efficientdet/, 2024. Accessed: 2025-03-09.
[29] Ultralytics, “Yolov10 model comparisons.” https://docs.ultralytics.com/de/models/ yolov10/#comparisons, 2024. Accessed: 2025-03-09.
[30] P. Team, “Mobilenetv3 small model.” https://pytorch.org/vision/main/models/ generat-ed/torchvision.models.mobilenet_v3_small.html#torchvision.models. mobilenet_v3_small, 2025. Accessed: 2025-03-09.
[31] O. Toolkit, “Efficientdet-d0 tensorflow model.” https://github.com/openvinotoolkit/ open_model_zoo/blob/master/models/public/efficientdet-d0-tf/README.md, 2025. Accessed: 2025-03-09.
[32] P. Team, “Faster r-cnn with resnet-50 fpn.” https://pytorch.org/vision/main/models/ generat-ed/torchvision.models.detection.fasterrcnn_resnet50_fpn.html, 2025. Accessed: 2025-03-09.
[33] F. Research, “Detr: End-to-end object detection with transformers.” https://github.com/ face-bookresearch/detr/blob/main/README.md, 2025. Accessed: 2025-03-09.
[34] PromptLayer, “Detr resnet-50 model overview.” https://www.promptlayer.com/models/ detr-resnet-50-6671, 2025. Accessed: 2025-03-09.
[35] Ultralytics, “Rtdetrv2 vs yolov10: A technical comparison for object detection.” https:// docs.ultralytics.com/compare/rtdetr-vs-yolov10/, 2024. Accessed: 2025-03-10.
[36] A. Kouidri, “Top object detection models in 2024.” https://www.ikomia.ai/blog/ top-object-detection-models-review, 2024. Accessed: 2025-03-10.
[37] PyTorch Community, “Metal performance shader (mps) discussion forum.” https:// dis-cuss.pytorch.org/c/metal-performance-shader/38, 2025. Accessed: 2025-03-10.
[38] A. Inc., “Accelerated tensorflow training with metal.” https://developer.apple.com/ met-al/tensorflow-plugin/, 2025. Accessed: 2025-03-10.
[39] R. Kumar, “List of cuda-aware frameworks in machine learning.” https://www. devopss-chool.com/blog/list-of-cuda-aware-framework-in-machine-learning/, 2024. Accessed: 2025-03-10.
[40] N. Corporation, “Deep learning software.” https://developer.nvidia.com/ deep-learning-software, 2025. Accessed: 2025-03-10.
[41] N. Corporation, “Eingebettete systeme: Entwicklerkits und module von nvidia jetson.” https: //www.nvidia.com/de-de/autonomous-machines/embedded-systems/, 2025. Zugegriffen: 2025-03-10.
[42] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time ob-ject detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni-tion (CVPR), pp. 779–788, 2016.
[43] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European Conference on Computer Vision (ECCV), pp. 740–755, Springer, 2014.