Influence of pre-processing on anomaly-based intrusion detection

  • Danijela Protić Serbian Armed Forces, General Staff, Department for Telecommunication and Informatics (Ј-6), Center for Applied Mathematics and Electronics, Belgrade, Republic of Serbia https://orcid.org/0000-0003-0827-2863
Keywords: anomaly-based intrusion detection, machine learning, Kyoto 2006

Abstract


Introduction/purpose: The anomaly-based intrusion detection system detects intrusions based on a reference model which identifies the normal behavior of a computer network and flags an anomaly. Machine-learning models classify intrusions or misuse as either normal or anomaly. In complex computer networks, the number of training records is large, which makes the evaluation of the classifiers computationally expensive.

Methods: A feature selection algorithm that reduces the dataset size is presented in this paper.

Results: The experiments are conducted on the Kyoto 2006+ dataset and four classifier models: feedforward neural network, k-nearest neighbor, weighted k-nearest neighbor, and medium decision tree. The results show high accuracy of the models, as well as low false positive and false negative rates.

Conclusion: The three-step pre-processing algorithm for feature selection and instance normalization resulted in improving performances of four binary classifiers and in decreasing processing time.

References

Ambedkar C. & Kishore Babu, V.2015. Detection of Probe Attacks Using Machine Learning Techniques. International Journal of Research Studies in Computer Science and Engineering, 2(3), pp.25-29 [online]. Available at: https://www.arcjournals.org/pdfs/ijrscse/v2-i3/7.pdf [Accessed: 29 June 2020].

Ashok Kumar, D. & Venugopalan, S.R. 2018.A Novel algorithm for Network Anomaly Detection using Adaptive Machine Learning. Singapore: Springer Singapore.

Kwak, Y.T., Hwang, J.W., & Yoo, C.J. 2011. A new damping strategy of Levenberg-Marquardt algorithm for multilayer perceptrons. Neural Network World, 21(4), pp.327-340. Available at: https://doi.org/10.14311/NNW.2011.21.020.

Levenberg, K. 1944. A method for the solution of certain problems in least squares. Quarterly of Applied Mathematics, 2, pp.164-168 Available at: https://doi.org/10.1090/qam/10666.

Marquardt, D.W. 1963. An Algorithm for Least-Squares Estimation of Nonlinear Parameters. Journal of the Society for Industrial and Applied Mathematics, 11(2), pp.431-441 [online]. Available at: https://www.jstor.org/stable/2098941?seq=1 [Accessed: 29 June 2020].

Nguyen, T.T.T. & Armitage, G. 2008. A Survey of Techniques for Internet Traffic Classification using Machine Learning. IEEE Communications Surveys & Tutorials, 10(4), pp.56-76. Available at: https://doi.org/10.1109/SURV.2008.080406.

Protić, D.D. 2018. Review of KDD CUP ’99, NSL-KDD and KYOTO 2006+ Datasets. Vojnotehnički glasnik/Military Technical Courier, 66(3), pp.580-596. Available at: https://doi.org/10.5937/vojtehg66-16670.

Protić, D. & Stanković, M. 2018. Anomaly-Based Intrusion Detection: Feature Selection and Normalization Influence to the Machine Learning Models Accuracy. European Journal of Formal Sciences and Engineering, 2(3), pp.101-106. Available at: http://dx.doi.org/10.26417/ejef.v2i3.p101-106.

Protić, D. & Stanković, M. 2020. Detection of Anomalies in the Computer Network Behavior. European Journal of Formal Sciences and Engineering, 4(1), pp.7-13. Available at: http://dx.doi.org/10.26417/ejef.v4i1.p7-13.

Song, J., Takakura, H., Okabe, Y., Eto, M., Inoue, D. & Nakao, K. 2011. Statistical Analysis of Honeypot Data and Building of Kyoto 2006+ Dataset forNIDS Evaluation. In: Proc. 1st Work-shop on BADGES - Building Anal. Datasets and Gathering Experience Returns for Security, Salzburg, pp.29-36, April 10-13. Available at: https://doi.org/10.1145/1978672.1978676.

-Split. 2020. What is false positive rate? [online]. Available at: https://www.split.io/glossary/false-positive-rate/ [Accessed: 29 June 2020].

Shirabad, J.S., Lethbridge, T.C. & Matwin, S. 2007. Modeling Relevance Relations Using Machine Learning Techniques. In: Zhang, D. & Tsai, J.J.P. (Eds.) Advances in Machine Learning Applications in Software Engineering, Chapter VIII, pp.168-207. Hershey, PA: Idea Group Pub. (IGI Global research collection). Available at: https://doi.org/10.4018/978-1-59140-941-1.ch008.

-Takakura. 2020. Traffic Data from Kyoto University’s Honeypots [online]. Available at: http://www.takakura.com/kyoto_data/ [Accessed: 29.06.2020].

Tsigkritis, T., Groumas G. & Schneider M. 2018. On the Use of k-NN in Anomaly Detection. Journal of Information Security, 9(1), pp.70-84. Available at: https://doi.org/10.4236/jis.2018.91006.

Published
2020/06/01
Section
Original Scientific Papers