Influence of pre-processing on anomaly-based intrusion detection
Abstract
Introduction/purpose: The anomaly-based intrusion detection system detects intrusions based on a reference model which identifies the normal behavior of a computer network and flags an anomaly. Machine-learning models classify intrusions or misuse as either normal or anomaly. In complex computer networks, the number of training records is large, which makes the evaluation of the classifiers computationally expensive.
Methods: A feature selection algorithm that reduces the dataset size is presented in this paper.
Results: The experiments are conducted on the Kyoto 2006+ dataset and four classifier models: feedforward neural network, k-nearest neighbor, weighted k-nearest neighbor, and medium decision tree. The results show high accuracy of the models, as well as low false positive and false negative rates.
Conclusion: The three-step pre-processing algorithm for feature selection and instance normalization resulted in improving performances of four binary classifiers and in decreasing processing time.
References
Ambedkar C. & Kishore Babu, V.2015. Detection of Probe Attacks Using Machine Learning Techniques. International Journal of Research Studies in Computer Science and Engineering, 2(3), pp.25-29 [online]. Available at: https://www.arcjournals.org/pdfs/ijrscse/v2-i3/7.pdf [Accessed: 29 June 2020].
Ashok Kumar, D. & Venugopalan, S.R. 2018.A Novel algorithm for Network Anomaly Detection using Adaptive Machine Learning. Singapore: Springer Singapore.
Kwak, Y.T., Hwang, J.W., & Yoo, C.J. 2011. A new damping strategy of Levenberg-Marquardt algorithm for multilayer perceptrons. Neural Network World, 21(4), pp.327-340. Available at: https://doi.org/10.14311/NNW.2011.21.020.
Levenberg, K. 1944. A method for the solution of certain problems in least squares. Quarterly of Applied Mathematics, 2, pp.164-168 Available at: https://doi.org/10.1090/qam/10666.
Marquardt, D.W. 1963. An Algorithm for Least-Squares Estimation of Nonlinear Parameters. Journal of the Society for Industrial and Applied Mathematics, 11(2), pp.431-441 [online]. Available at: https://www.jstor.org/stable/2098941?seq=1 [Accessed: 29 June 2020].
Nguyen, T.T.T. & Armitage, G. 2008. A Survey of Techniques for Internet Traffic Classification using Machine Learning. IEEE Communications Surveys & Tutorials, 10(4), pp.56-76. Available at: https://doi.org/10.1109/SURV.2008.080406.
Protić, D.D. 2018. Review of KDD CUP ’99, NSL-KDD and KYOTO 2006+ Datasets. Vojnotehnički glasnik/Military Technical Courier, 66(3), pp.580-596. Available at: https://doi.org/10.5937/vojtehg66-16670.
Protić, D. & Stanković, M. 2018. Anomaly-Based Intrusion Detection: Feature Selection and Normalization Influence to the Machine Learning Models Accuracy. European Journal of Formal Sciences and Engineering, 2(3), pp.101-106. Available at: http://dx.doi.org/10.26417/ejef.v2i3.p101-106.
Protić, D. & Stanković, M. 2020. Detection of Anomalies in the Computer Network Behavior. European Journal of Formal Sciences and Engineering, 4(1), pp.7-13. Available at: http://dx.doi.org/10.26417/ejef.v4i1.p7-13.
Song, J., Takakura, H., Okabe, Y., Eto, M., Inoue, D. & Nakao, K. 2011. Statistical Analysis of Honeypot Data and Building of Kyoto 2006+ Dataset forNIDS Evaluation. In: Proc. 1st Work-shop on BADGES - Building Anal. Datasets and Gathering Experience Returns for Security, Salzburg, pp.29-36, April 10-13. Available at: https://doi.org/10.1145/1978672.1978676.
-Split. 2020. What is false positive rate? [online]. Available at: https://www.split.io/glossary/false-positive-rate/ [Accessed: 29 June 2020].
Shirabad, J.S., Lethbridge, T.C. & Matwin, S. 2007. Modeling Relevance Relations Using Machine Learning Techniques. In: Zhang, D. & Tsai, J.J.P. (Eds.) Advances in Machine Learning Applications in Software Engineering, Chapter VIII, pp.168-207. Hershey, PA: Idea Group Pub. (IGI Global research collection). Available at: https://doi.org/10.4018/978-1-59140-941-1.ch008.
-Takakura. 2020. Traffic Data from Kyoto University’s Honeypots [online]. Available at: http://www.takakura.com/kyoto_data/ [Accessed: 29.06.2020].
Tsigkritis, T., Groumas G. & Schneider M. 2018. On the Use of k-NN in Anomaly Detection. Journal of Information Security, 9(1), pp.70-84. Available at: https://doi.org/10.4236/jis.2018.91006.
Proposed Creative Commons Copyright Notices
Proposed Policy for Military Technical Courier (Journals That Offer Open Access)
Authors who publish with this journal agree to the following terms:
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).