Journal of Computer and Forensic Sciences
https://aseestant.ceon.rs/index.php/jcfs
<p class="MsoNormal" style="margin-bottom: 4.3pt; text-align: justify;"><span style="font-size: 11.0pt; font-family: 'Times New Roman','serif';">The Journal of Computer and Forensic Sciences is an open access, peer-reviewed scientific journal published by the University of Criminal Investigation and Police Studies in Belgrade, covering advanced and innovative research across the fields of computer and forensic sciences. The aim of the journal is to provide a platform through which authors can communicate their viewpoints on diverse but often related aspects of computer and forensic sciences and a source of information to support advancing research, education, and practice in these fields.</span></p> <p> </p> <p> </p>University of Criminal Investigation and Police Studiesen-USJournal of Computer and Forensic Sciences2956-087X<p>I (we), the author(s), hereby declare under full moral, financial and criminal liability that the manuscript submitted for publication to the Journal of Computer and Forensic Sciences</p> <p>a) is the result of my (our) own original research and that I (we) hold the right to publish it;</p> <p>b) does not infringe any copyright or other third-party proprietary rights;</p> <p>c) complies with the Journal’s research and publishing ethics standards;</p> <p>d) has not been published elsewhere, under this or any other title;</p> <p>e) is not under consideration by another publication, under this or any other title.</p> <p>I (we) also declare under full moral, financial and criminal liability:</p> <p>f) that all conflicts of interest that may directly or potentially influence or impart bias on the work have been disclosed in the manuscript;</p> <p>g) that if the article has been accepted for publishing I (we) will transfer all copyright ownership of the manuscript to the University of Criminal Investigation and Police Studies in Belgrade.</p> <p><span style="color: #1d2228; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 12pt;">Signed by the Corresponding Author on behalf of the all other authors.</span></p> <p> </p> <p> </p> <p> </p>A Hybrid Plagiarism Detection Framework Using Lexical and Semantic Similarity with Lightweight Sentence Transformers
https://aseestant.ceon.rs/index.php/jcfs/article/view/64419
<p><span style="font-size: 11.0pt; line-height: 115%; font-family: 'Times New Roman',serif; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-ansi-language: EN-IN; mso-fareast-language: EN-US; mso-bidi-language: AR-SA;">Plagiarism detection has become increasingly challenging due to the widespread availability of paraphrasing tools and generative artificial intelligence systems. Traditional plagiarism detection techniques based on lexical similarity, such as TF-IDF and n-gram matching, often fail to identify semantically similar but lexically modified text. This paper presents a hybrid plagiarism detection framework that combines lexical similarity measures with semantic similarity derived from sentence transformer models. The proposed approach integrates TF-IDF–based cosine similarity with lightweight sentence embeddings generated using MiniLM and SBERT models. To enhance semantic detection performance, a MiniLM-based sentence transformer is fine-tuned on the PAN 2011 plagiarism detection corpus. Experimental evaluation demonstrates that the hybrid similarity approach significantly improves detection accuracy compared to purely lexical methods, particularly for paraphrased plagiarism cases. The framework is further validated using threshold-based analysis and real-world web content retrieved through automated scraping. The proposed system provides an efficient and scalable solution for plagiarism detection, balancing computational efficiency with semantic understanding, and is suitable for academic and real-world forensic applications.</span></p>RAHUL Birwadkar
Copyright (c) 2026 University of Criminal Investigation and Police Studies, Belgrade, Serbia
2026-05-062026-05-06Semantic Paraphrase Generation Using Transformer Architectures: A Comparative Study of Pre-trained and Fine-Tuned Models
https://aseestant.ceon.rs/index.php/jcfs/article/view/64420
<p> Semantic paraphrase generation plays a crucial role in academic and technical writing by enabling authors to restate content while preserving its original meaning. Traditional paraphrasing approaches, such as rule-based rewriting and statistical methods, often struggle to maintain semantic consistency and linguistic fluency, especially for complex or longer text segments. Recent advances in transformer-based architectures have significantly improved text generation capabilities by leveraging contextual representations and self-attention mechanisms. This paper presents a comparative study of pre-trained and fine-tuned transformer models for semantic paraphrase generation. We evaluate encoder–decoder–based transformer architectures, with a primary focus on the BART model in both pre-trained and fine-tuned settings, alongside a large generative language model used for paraphrase generation. The fine-tuning process adapts pre-trained models to paraphrasing tasks using task-specific data, enabling improved control over semantic preservation and output consistency. The evaluation is conducted using both quantitative and qualitative analysis, including training and validation loss trends and comparative examination of generated paraphrases. Experimental results demonstrate that fine tuned transformer models produce paraphrases with higher semantic fidelity and structural coherence compared to their pre-trained counterparts, while large generative models offer fluent but less deterministic outputs. The findings highlight the importance of task-specific fine-tuning for controlled and semantically accurate paraphrase generation. This study contributes practical insights into the selection and adaptation of transformer architectures for paraphrasing applications, particularly in academic and research-oriented writing contexts.</p>RAHUL Birwadkar
Copyright (c) 2026 University of Criminal Investigation and Police Studies, Belgrade, Serbia
2026-05-062026-05-0610.5937/jcfs4-64420Comparative Analysis of Clustering Textual and Numerical Data Using the K-Means Algorithm
https://aseestant.ceon.rs/index.php/jcfs/article/view/64411
<p class="MsoNormal" style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; text-align: justify; line-height: normal;"><span lang="SR-CYRL-RS" style="font-size: 12.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-fareast-language: SR-CYRL-RS;">This paper presents a comparative analysis of the application of the K-Means clustering algorithm on two different types of data – textual and numerical. The aim of the research was to examine the reliability, stability, and interpretability of the results when the same algorithm is applied to semantically diverse datasets. The textual data were taken from the articles of the Criminal Code of the Republic of Serbia, where clustering was performed after preprocessing and TF-IDF vectorization. The numerical data refer to traffic accident statistics from 2015 to 2021, analyzing parameters such as the number of property-damage-only accidents, the number of injured persons, and the number of fatalities.</span></p> <p class="MsoNormal" style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; text-align: justify; line-height: normal;"><span lang="SR-CYRL-RS" style="font-size: 12.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-fareast-language: SR-CYRL-RS;">The results showed that clustering on textual data produced a relatively clear separation of thematic groups of articles, but with a moderate silhouette coefficient value due to a high degree of semantic similarity among documents. On the other hand, clustering on numerical data demonstrated a more stable structure, where the optimal number of clusters was two, indicating the possibility of distinguishing periods with different intensity and severity of traffic accidents.</span></p> <p class="MsoNormal" style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto; text-align: justify; line-height: normal;"><span lang="SR-CYRL-RS" style="font-size: 12.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-fareast-language: SR-CYRL-RS;">It was concluded that the K-Means algorithm provides more reliable and interpretable results for numerical data, while in the case of textual data, it requires more precise vector space modeling and possibly the application of semantic models such as Word2Vec or BERT. The paper serves as a basis for further research in the field of integrating machine learning techniques for analyzing heterogeneous data sources.</span></p>Sanja Raičević
Copyright (c) 2026 University of Criminal Investigation and Police Studies, Belgrade, Serbia
2026-05-062026-05-0610.5937/jcfs4-64411