Comparative Analysis of Certain Clustering Algorithms That Do Not Require Predefined Number of Clusters on the Articles of the Criminal Code of the Republic of Serbia

  • Sanja Raičević Ministry of Interior
  • Vojkan Nikolić University of Criminal Investigation and Police Studies
Keywords: clustering, clustering algorithms, DBSCAN, Mean-Shift, hierarchical clustering, Criminal Code, textual analysis, undefined number of clusters, legislative text analysis, thematic groups, computational data analysis, legal documents, automated cluster analysis

Abstract


This paper explores the application of certain clustering algorithms for analyzing textual documents from the Criminal Code (CC) of the Republic of Serbia (RS). Clustering was performed using three popular algorithms: DBSCAN, Mean-Shift, and Hierarchical Clustering (Agglomerative Clustering). The input data consisted of textual documents in .txt format, where each document corresponds to a specific article of the law. The aim of this study is to identify thematic groups within the legislative texts and analyze the advantages and disadvantages of each algorithm in the context of the specific characteristics of legal documents. The clustering results using these three algorithms show that DBSCAN faces challenges with noise, while Mean-Shift effectively detects dense clusters, and Hierarchical Clustering allows for a detailed analysis at various levels of granularity. In conclusion, this paper provides valuable insights into the application of these clustering algorithms to legal texts and offers recommendations for their selection when analyzing similar datasets.

References

[1] A. J. Smith i K. L. Brown, „Data Clustering Techniques in Big Data Analytics,“ u TRIP Symposium on Data Science Applications, 2018.
[2] V. N. M. C. J. M. D. R. S. Nedeljković, „An Anvanced Quick-AnsweringSystem Intended for the e-Government Service in Republick of Serbia,“ Acta Polytechnica Hungarica, т. Vol. 16, pp. 153-174, 2019.
[3] L. Silva i M. Johnson, „Analysis of Fixed and Variable Cluster Number Techniques in K-means and K-medoids Clustering,“ u Proceedings of the 18th International Conference on Data Clustering and Machine Learning Applications (TRIPOLI), 2021.
[4] A. Lopez i Y. Wang, „Exploring Density-Based and Hierarchical Methods for Unspecified Cluster Numbers in Data Analysis,“ u Proceedings of the 19th International Conference on Advanced Data Clustering Techniques (TRIPOLI), 2022.
[5] R. Mitchell i T. Jang, „Comparative Analysis of Hierarchical, DBSCAN, and Mean-Shift Clustering Algorithms,“ u Proceedings of the 22nd TRIPOLI Conference on Data CLustering and Analysis, 2022.
[6] L. Chan i T. Gomez, „Hierarchical Clustering Methos and Their Applications in Small to Medium Data Sets,“ u Proceedings of the 20th TRIPOLI Conference on Data Clustering and Visualization, 2021.
[7] R. Gonzalez i Y. Kim, „Application of DBSCAN for High Noise Data Clustering,“ u Proceedings of the 19th TRIPOLI Conference on Advanced Clustering Techniques, 2020.
[8] L. Patterson i Y. Chen, „Mean-Shift Clustering in Image and Object Recognition,“ u TRIPOLI Conference Proceedings on Non-Parametric Clustering Approaches, 2021.
[9] X. Chen i T. Fischer, „Advanced Libraries for Text Data Clustering and Visualization,“ u Proceedings of the TRIPOLI Conference on Text Data Analysis, 2022.
[10] B. M. K. K. D. R. P. Č. V. Nikolić, „Modelling the System of Receiving Quick Answers for e-Government Services: Study for the Crime Domain in the Republic of Serbia,“ Acta Polytechnica Hungarica, т. Vol. 14, pp. 143-163, 2017.
[11] H. Liu i Z. Zhang, „Python Libraries for Clustering and Visualization of Textual Data,“ Journal of Computational Data Analysis, pp. 120-134, 2023.
[12] S. Molina i M. Albrecth, „Application of Clustering Algorithms for Textual Data Analysis,“ u Proceedings of the 8th International Conference on Advanced Data Science and Computing (TRIPOLI), 2023.
[13] Y. Zhang i T. Li, „Clustering Techniques forr Textual Data in Natural Language Processing,“ u Proceedings of the 8th International Conference on Advanced Data Science and Computing (TRIPOLI), 2023.
[14] M. Jovanović i D. Petrov, „Comparison of Clustering Algorithms for Legal Text Analysis: DBSCAN, Mean-Shift, and Hierarchical Clustering,“ u Proceedings of the 9th International Conference on Data Science and Artificial Intelligence (TRIPOLI), 2023.
Published
2024/12/25
Section
Članci