Review of KDD Cup ‘99, NSL-KDD and Kyoto 2006+ datasets
Abstract
This paper presents a review of three datasets, namely KDD Cup ‘99, NSL-KDD and Kyoto 2006+ datasets, which are widely used in researching intrusion detection in computer networks. The KDD Cup ‘99 dataset consists of five million records, each containing 41 features which can classify malicious attacks into four classes: Probe, DoS, U2R and R2L. The KDD Cup ‘99 dataset cannot reflect real traffic data since it was generated by simulation over a virtual computer network. In the NSL-KDD dataset, redundant and duplicate records form the KDD Cup ‘99 dataset are removed from training and test sets, respectively. The Kyoto 2006+ dataset is built on real three year-network traffic data which are labeled as normal (no attack), attack (known attack) and unknown attack. The Kyoto 2006+ dataset contains 14 statistical features derived from the KDD Cup ‘99 dataset and 10 additional features.
References
Aggarwal, P. & Sharma, S.K. 2015. Analysis of KDD Dataset Attributes – Class Wise for Intrusions Detection. In: Procedia Computer Science, 57, pp.842-851. Available at: https://doi.org/10.1016/j.procs.2015.07.490.
Al-Dhafian, B., Ahmad, I. & Al-Ghamid, A. 2015. An Overview of the Current Classification Techniques. In: International Conference on Security and Management, Las Vegas, USA, pp.82-88, July 27-30.
Bukola, O. & Adetunmbi, A.O. 2016. Auto-Immunity Dendritic Cell Algorithm. In: International Journal of Computer Applications, 137(2), pp.10-17, March 2016. New York: Foundation of Computer Science. Available at: https://doi.org/10.5120/ijca2016908689.
Gifty Jeya, P., Ravichandran, M. & Ravichandran, C.S. 2012. Efficient Classifier for R2L and U2R Attacks. International Journal of Computer Applications, 45(21), pp.28-32. Available at: http://www.ijcaonline.org/archives/volume45/number21/7076-9751. Accessed: 10.01.2018.
Kavitha, P. & Usha, M. 2014. Anomaly based intrusion detection in WLAN using discrimination algorithm combined with Naïve Bayesian classifier. Journal of Theoretical and Applied Information Technology, 62(1), pp.77-84. Available at: http://www.jatit.org/volumes/Vol62No1/11Vol62No1.pdf. Accessed: 11.01.2018.
KDD CUP ‘99 dataset. [Internet] Available at: http://kdd.ics.uci.edu/dataset/kddcup’99/kddcup’99.html. Accessed: 12.02.2018.
Kolez, A., Chowdhury, A. & Alspector, J. 2003. Data duplication: an imbalance problem? In: ICML 2003. Workshop on Learning from Imbalanced Data Sets (II), Whashington, August 21.
Maček, N. & Milosavljević, M. 2013. Critical Analysis of the KDD Cup ’99 data set and research methodology for machine learning. In: Proceedings of the 57th ETRAN conference, Zlatibor, pp.(VI 2.3.1-4.), June 3-6.
Nkiama, H., Said, S.Z.M. & Saidu, M. 2016. A Subset Feature Elimination Mechanisms for Intrusion Detection System. International Journal of Advanced Computer Science and Application, 7(4), pp.148-157. Available at: https://doi.org/10.14569/IJACSA.2016.070419.
Paliwal, S. & Gupta, R. 2012. Denial-of-Service, Probing & Remote to User (R2L) Attack Detection using Genetic Algorithm. International Journal of Computer Applications, 60(19), pp.57-62. Available at: http://www.ijcaonline.org/archives/volume60/number19/9813-4306. Accessed: 12.02.2018.
Protić, D. 2016. Neural Cryptography. Vojnotehnički glasnik/Military Technical Courier, 64(2), pp.483-495. Available at: https://doi.org/10.5937/vojtehg64-8877.
Revathi, S. & Malathi, A. 2013. A Detailed Analysis on NSL-KDD Dataset Using Various Machine Learning Techniques for Intrusion Detection. International Journal of Engineering Research & Technology, 2(12), pp.1848-1853. Available at: file:///C:/Users/Intel/Downloads/V2I12_IJERTV2IS120804.pdf. Accessed: 12.02.2018.
SIGKDD - KDD Cup. KDD Cup 1999: Computer network intrusion detection. [Internet]. Available at: www.kdd.org. Accessed: 13.02.2018.
Singh, R., Kumar, H. & Singla, R.K. 2015. An intrusion detection system using network traffic profiling and online sequential extreme learning machine. Expert Systems With Applications, 42(22), pp.8609-8624. Available at: https://doi.org/10.1016/j.eswa.2015.07.015.
Song, J., Takakura, H., Okabe, Y., Eto, M., Inoue, D. & Nakao, K. 2011. Statistical Analysis of Honeypot Data and Building of Kyoto 2006+ Dataset for NIDS Evaluation. In: Proc. 1st Work-shop on Building Anal. Datasets and Gathering Experience Returns for Security. Salzburg, pp.29-36. April 10-13. Available at: https://doi.org/10.1145/1978672.1978676.
Tavallaee, M., Bagheri, E., Lu, W. & Ghorbani Ali, A. 2009. A Detailed Analysis of the KDD CUP ‘99 Data Set. In: Proceedings of the 2009 IEEE Symposium on Computational Intelligence in Security and Defense Applications. Ottwa, ON, Canada, July 8-10. Available at: https://doi.org/10.1109/CISDA.2009.5356528.
Proposed Creative Commons Copyright Notices
Proposed Policy for Military Technical Courier (Journals That Offer Open Access)
Authors who publish with this journal agree to the following terms:
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).