Keywords: classification, clustering, python, information security awareness


This research discussed our experienced on building a machine learning model on the human aspect of information security awareness. The model was built through a classification and clustering approach using a broad outline process including importing data, handling incomplete data, compiling datasets, feature scaling, building models, and evaluating models. The dataset was arranged based on the results of a questionnaire referred to as the the Human Aspects of Information Security Questionnaire (HAIS-Q) to Indonesian society. The results of the classification model were evaluated by several methods, including k-fold Cross Validation analysis, Confusion Matrix, Receiver Operating Characteristics, and score calculation for each model. One of the algorithms in the classification used is the Support Vector Machine that has an accuracy performance of 99.7% and an error rate of 0.3%. One of the algorithms in clustering is the DBSCAN which has an adjusted rand index value of always close to 0.


[1] B. P. Statistik, Statistik Telekomunikasi Indonesia 2017, Jakarta: Badan Pusat Statistik, 2018.
[2] C. Easttom and W. Butler, "A Modified McCumber Cube as a Basis for a Taxonomy of Cyber Attacks," in IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 2019.
[3] R. v. Solms and J. v. Niekerk, "From information security to cyber security," Computers & Security, vol. 38, pp. 97-102, 2013.
[4] J. McCumber, Assessing and Managing Security Risk in IT Systems: A Structured Methodology, USA: Auerbach Publications, 2004.
[5] S. Kraemer, P. Carayon and J. Clem, "Human and organizational factors in computer and information security: Pathways to vulnerabilities," computers & security, vol. 28, pp. 509-520, 2009.
[6] T. W. Edgar and D. O. Manz, "Chapter 6 - Machine Learning," in Research Methods for Cyber Security, United States, Syngress, 2017, pp. 153-173.
[7] G. Apruzzese, M. Colajanni, L. Ferretti, A. Guido and M. Marchetti, "On the effectiveness of machine and deep learning for cyber security," in 2018 10th International Conference on Cyber Conflict (CyCon), Tallinn, 2018.
[8] M. Alohali, N. Clarke, S. Furnell and S. Albakri, "Information security behavior: Recognizing the influencers," in 2017 Computing Conference, London, 2017.
[9] S. Bauer and E. W. Bernroider, "From Information Security Awareness to Reasoned Compliant Action: Analyzing Information Security Policy Compliance in a Large Banking Organization," ACM SIGMIS Database: the DATABASE for Advances in Information Systems, vol. 48, p. 44–68, 2017.
[10] A. Cindana and Y. Ruldeviyani, "Measuring Information Security Awareness on Employee Using HAIS-Q: Case Study at XYZ Firm," International Conference on Advanced Computer Science and Information Systems (ICACSIS), pp. 289 - 294, 2018.
[11] M. G. Ikhsan and K. Ramli, "Measuring the Information Security Awareness Level of Government Employees," 34th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC), 2019.
[12] M. S. b. O. Mustafa, M. N. Kabir and F. Erna, "An Enhanced Model for Increasing Awareness of Vocational Students Against Phishing Attacks," in 2019 IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS), Selangor, Malaysia, 2019.
[13] D. D. H. Wahyudiwan, Y. G. Sucahyo and A. Gandhi, "Information security awareness level measurement for employee: Case study at ministry of research, technology, and higher education," 3rd International Conference on Science in Information Technology (ICSITech), pp. 654 - 658, 2017.
[14] Y. Normandia, L. Kumaralalita, A. N. Hidayanto, W. S. Nugroho and M. R. Shihab, "Measurement of Employee Information Security Awareness Using Analytic Hierarchy Process (AHP): A Case Study of Foreign Affairs Ministry," in 2018 International Conference on Computing, Engineering, and Design (ICCED), Bangkok, Thailand, 2018.
[15] A. Farooq, S. Alifov, S. Virtanen and J. Isoaho, "Towards comprehensive information security awareness: a systematic classification of concerns among university students," In Proceedings of the 32nd International BCS Human Computer Interaction Conference (HCI ’18), p. 1–6, 2018.
[16] A. Carella, M. Kotsoev and T. M. Truta, "Impact of security awareness training on phishing click-through rates," in 2017 IEEE International Conference on Big Data (Big Data), Boston, MA, 2017.
[17] S. Alelyani, J. Tang and H. Liu, "Feature Selection for Clustering: A Review," in Chapter 2, New York, Chapman and Hall/CRC, 2013.
[18] A. L’Heureux, K. Grolinger, H. F. ElYamany and M. A. M. Capretz, "Machine Learning with Big Data: Challenges," IEEE Access, vol. 5, pp. 7776-7797, 2017.
[19] V. S. Saridewi and R. F. Sari, "Feature Selection In The Human Aspect of Information Security Questionnaires Using Multicluster Feature Selection," International Journal of Advanced Science and Technology, vol. 29, no. 7, pp. 3484-3493, 2020.
[20] M. Nieles, K. L. Dempsey and V. Y. Pillitteri, "An Introduction to Information Security," Special Publication (NIST SP), US, 2017.
[21] K. Parsons, D. Calic, M. Pattinson and et.all, "The Human Aspects of Information Security Questionnaire (HAIS-Q): Two further validation studies," Computers & Security, vol. 66, pp. 40-51, 2017.
[22] H. Kruger and W. Kearney, "A prototype for assessing information security awareness," Computers & Security, vol. 25, no. 4, pp. 289-296, 2006.
[23] E. Alpaydin, Introduction to Machine Learning, Second Edition, US: The MIT Press, 2009.
[24] Google Developers, "Machine Learning Crash Course," Google Developers, [Online]. Available: [Accessed 11 4 2020].
[25] I. Goodfellow, Y. Bengio and A. Courville, "Machine Learning Basics," in Deep Learning, The MIT Press, 2016, p. 98.
[26] M. Swamynathan, Mastering Machine Learning with Python in Six Steps, Berkeley, CA: Apress, 2017.
[27] W. Lee, Python® Machine Learning, Indianapolis: John Wiley & Sons, Inc., 2019.
[28] S. Raschka, "Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning," 2018.
[29] A. K. Nandi and H. Ahmed, "Classification Algorithm Validation," in Condition Monitoring with Vibration Signals: Compressive Sampling and Learning Algorithms for Rotating Machines, 307-319, 2019, pp. 307-319.
[30] D. Xu and Y. Tian, "A Comprehensive Survey of Clustering Algorithms," Annals of Data Science, vol. 2, p. 165–193, 2015.
[31] J. Wang, Y. Wu, H.-H. Hsu and Z. Cheng, "Spatial Big Data Analytics for Cellular Communication Systems," in Big Data Analytics for Sensor-Network Collected Intelligence, Academic Press, 2017, pp. 153-166.
[32] D. L. Davies and D. W. Bouldin, "A Cluster Separation Measure," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vols. PAMI-1, no. 2, pp. 224-227, 1979.
[33], "scikit-learn," scikit-learn developers (BSD License), [Online]. Available: [Accessed 27 April 2020].
[34] V. Kotu and B. Deshpande, "Chapter 7 - Clustering," in Data Science (Second Edition), Morgan Kaufmann, 2019, pp. 221 - 261.
[35] M. C. Thomas and J. Romagnoli, "Extracting knowledge from historical databases for process monitoring using feature extraction and data clustering," Computer Aided Chemical Engineering, vol. 38, pp. 859-864, 2016.
[36] R. J. Mejias, "An Integrative Model of Information Security Awareness for Assessing Information Systems Security Risk," in 2012 45th Hawaii International Conference on System Sciences, Maui, HI, 2012.
[37] J. Brownlee, "Machine Learning Mastery," Machine Learning Mastery Pty. Ltd., 2020. [Online]. Available: [Accessed 23rd May 2020].
Original Scientific Paper