Comparative Analysis of Certain Clustering Algorithms That Do Not Require Predefined Number of Clusters on the Articles of the Criminal Code of the Republic of Serbia
Abstract
This paper explores the application of certain clustering algorithms for analyzing textual documents from the Criminal Code (CC) of the Republic of Serbia (RS). Clustering was performed using three popular algorithms: DBSCAN, Mean-Shift, and Hierarchical Clustering (Agglomerative Clustering). The input data consisted of textual documents in .txt format, where each document corresponds to a specific article of the law. The aim of this study is to identify thematic groups within the legislative texts and analyze the advantages and disadvantages of each algorithm in the context of the specific characteristics of legal documents. The clustering results using these three algorithms show that DBSCAN faces challenges with noise, while Mean-Shift effectively detects dense clusters, and Hierarchical Clustering allows for a detailed analysis at various levels of granularity. In conclusion, this paper provides valuable insights into the application of these clustering algorithms to legal texts and offers recommendations for their selection when analyzing similar datasets.
References
[2] V. N. M. C. J. M. D. R. S. Nedeljković, „An Anvanced Quick-AnsweringSystem Intended for the e-Government Service in Republick of Serbia,“ Acta Polytechnica Hungarica, т. Vol. 16, pp. 153-174, 2019.
[3] L. Silva i M. Johnson, „Analysis of Fixed and Variable Cluster Number Techniques in K-means and K-medoids Clustering,“ u Proceedings of the 18th International Conference on Data Clustering and Machine Learning Applications (TRIPOLI), 2021.
[4] A. Lopez i Y. Wang, „Exploring Density-Based and Hierarchical Methods for Unspecified Cluster Numbers in Data Analysis,“ u Proceedings of the 19th International Conference on Advanced Data Clustering Techniques (TRIPOLI), 2022.
[5] R. Mitchell i T. Jang, „Comparative Analysis of Hierarchical, DBSCAN, and Mean-Shift Clustering Algorithms,“ u Proceedings of the 22nd TRIPOLI Conference on Data CLustering and Analysis, 2022.
[6] L. Chan i T. Gomez, „Hierarchical Clustering Methos and Their Applications in Small to Medium Data Sets,“ u Proceedings of the 20th TRIPOLI Conference on Data Clustering and Visualization, 2021.
[7] R. Gonzalez i Y. Kim, „Application of DBSCAN for High Noise Data Clustering,“ u Proceedings of the 19th TRIPOLI Conference on Advanced Clustering Techniques, 2020.
[8] L. Patterson i Y. Chen, „Mean-Shift Clustering in Image and Object Recognition,“ u TRIPOLI Conference Proceedings on Non-Parametric Clustering Approaches, 2021.
[9] X. Chen i T. Fischer, „Advanced Libraries for Text Data Clustering and Visualization,“ u Proceedings of the TRIPOLI Conference on Text Data Analysis, 2022.
[10] B. M. K. K. D. R. P. Č. V. Nikolić, „Modelling the System of Receiving Quick Answers for e-Government Services: Study for the Crime Domain in the Republic of Serbia,“ Acta Polytechnica Hungarica, т. Vol. 14, pp. 143-163, 2017.
[11] H. Liu i Z. Zhang, „Python Libraries for Clustering and Visualization of Textual Data,“ Journal of Computational Data Analysis, pp. 120-134, 2023.
[12] S. Molina i M. Albrecth, „Application of Clustering Algorithms for Textual Data Analysis,“ u Proceedings of the 8th International Conference on Advanced Data Science and Computing (TRIPOLI), 2023.
[13] Y. Zhang i T. Li, „Clustering Techniques forr Textual Data in Natural Language Processing,“ u Proceedings of the 8th International Conference on Advanced Data Science and Computing (TRIPOLI), 2023.
[14] M. Jovanović i D. Petrov, „Comparison of Clustering Algorithms for Legal Text Analysis: DBSCAN, Mean-Shift, and Hierarchical Clustering,“ u Proceedings of the 9th International Conference on Data Science and Artificial Intelligence (TRIPOLI), 2023.
I (we), the author(s), hereby declare under full moral, financial and criminal liability that the manuscript submitted for publication to the Journal of Computer and Forensic Sciences
a) is the result of my (our) own original research and that I (we) hold the right to publish it;
b) does not infringe any copyright or other third-party proprietary rights;
c) complies with the Journal’s research and publishing ethics standards;
d) has not been published elsewhere, under this or any other title;
e) is not under consideration by another publication, under this or any other title.
I (we) also declare under full moral, financial and criminal liability:
f) that all conflicts of interest that may directly or potentially influence or impart bias on the work have been disclosed in the manuscript;
g) that if the article has been accepted for publishing I (we) will transfer all copyright ownership of the manuscript to the University of Criminal Investigation and Police Studies in Belgrade.
Signed by the Corresponding Author on behalf of the all other authors.
