THE RULE BASED CLASSIFICATION MODELS FOR MHC BINDING PREDICTION AND IDENTIFICATION OF THE MOST RELEVANT PHYSICOCHEMICAL PROPERTIES FOR THE INDIVIDUAL ALLELE
Abstract
Binding of proteolyzed fragments of proteins to MHC molecules is essential and the most selective step that determines T-cell epitopes. Therefore, the prediction of MHC-peptide binding is principal for anticipating potential T cell epitopes and is of immense relevance in vaccine design. Despite numerous methods for predicting MHC binding ligands, there still exist limitations that affect the reliability of a prevailing number of methods. Certain important methods based on physicochemical properties have very low reported accuracy. The aim of this paper is to present a new approach of extracting the most important physicochemical properties that influence the classification of MHC-binding ligands. In this study, we have developed rule based classification models which take into account the physicochemical properties of amino acids and their frequencies. The models use k-means clustering technique for extracting the relevant physicochemical properties. The results of the study indicate that the physicochemical properties of amino acids contribute significantly to the peptide-binding and that the different alleles are characterized by a different set of the physicochemical properties.
References
Brusic, V., Bajic, V.B., & Petrovsky, N. 2004. Computational methods for prediction of T-cell epitopes: A framework for modelling, testing, and applications. Methods, 34(4), pp. 436-43, pmid:15542369.
Hartigan, J.A. 1975. Clustering Algorithms.New York, NY: USA John Wiley & Sons..
Heckerman, D., Kadie, C., & Listgarten, J. 2007. Leveraging information across HLA alleles/supertypes improves epitope prediction. J Comput Biol, 14(6), pp. 736-746. doi:10.1089/cmb.2007.R013.
Jandrlić, R.D., Lazić, M.G., Mitić, S.N., & Pavlović, D.M. 2016. Software tools for simultaneous data visualization and T cell epitopes and disorder prediction in proteins. Journal of Biomedical Informatics, . doi:10.1016/j.jbi.2016.01.016.
Joachims, T. 1997. A probabilistic analysis of the rocchio algorithm with tfidf for text categorization. . In: InInternational Conference on Machine Learning, ICML.
Joachims, T. 2005. Text categorization with Support Vector Machines: Learning with many relevant features. Sprnger.Lecture Notes in Computer Science, 1398, pp. 137-142.
Luo, H., Ye, H., Ng, H.W., Shi, L., Tong, W., Mendrick, D.L., & Hong, H. 2015. Machine Learning Methods for Predicting HLA-Peptide Binding Activity. Bioinform Biol Insights, 9(3), pp. 21-29, doi:10.4137/BBI.S29466.
Mitić, S.N., Pavlović, D.M., & Jandrlić, R.D. 2014. Epitope distribution in ordered and disordered protein regions: Part A. T-cell epitope frequency, affinity and hydropathy. J Immunol Methods, 406, pp. 83-103, doi:10.1016/j.jim.2014.02.012.
Martineau Justin, , & Finin Tim, 2009. Delta TFIDF: An Improved Feature Space for Sentiment Analysis. . In: Proceedings of the Third AAAI Internatonal Conference on Weblogs and Social Media. San Jose, CA: AAAI Press. May.
Nielsen, M., Lundegaard, C., Blicher, T., Lamberth, K., Harndahl, M., Justesen, S., & et al., 2007. NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and -B locus protein of known sequence.PLoS One, 2(8). doi:10.1371/journal.pone.0000796.
Nielsen, M., Lundegaard, C., Blicher, T., Peters, B., Sette, A., Justesen, S., & et al., 2008. Quantitative predictions of peptide binding to any HLA-DR molecule of known sequence: NetMHCIIpan. PLoS Comput Biol, 4(7), p. 1000107, doi:10.1371/journal.pcbi.1000107.
Pavlović, D.M., Jandrlić, R.D., & Mitić, S.N. 2014. Epitope distribution in ordered and disordered protein regions. Part B: Ordered regions and disordered binding sites are targets of T- and B-cell immunity. J Immunol Methods, 407, pp. 90-107, doi:10.1016/j.jim.2014.03.027.
Pingping Guan, I.A.D., Christianna Zygouri, , & Flower, D.R. 2003. MHCPred: A server for quantitative prediction of peptide-MHC binding., pp. 3621-3624.
Rousseeuw, P.J. 1986. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, . doi:10.1016/0377-0427(87)90125-7.
Roy, K., Kar, S., & Das, R.N. 2015. A Primer on QSAR/QSPR Modeling. Retrieved from http://www.springer.com/978-3-319-17280-4
Sidney, J., Southwood, S., Mann, D.L., Fernandez-Vina, M.A., Neuman, M.J., & Sette, A. 2001. Majority of peptides binding HLA-A*0201 with high affinity crossreact with other A2-supertype molecules. Hum Immunol, 62, pp. 1200-1216.
Tian, F., Yang, L., Lv, F., Yang, Q., & Zhou, P. 2009. In silico quantitative prediction of peptides binding affinity to human MHC molecule: An intuitive quantitative structure-activity relationship approach. Amino Acids, 36(3), pp. 535-554, doi:10.1007/s00726-008-0116-8.
Tung, C.W., & Ho, S.Y. 2007. POPI: predicting immunogenicity of MHC class I binding peptides by mining informative physicochemical properties. Bioinformatics, 23(8), pp. 942-949, doi:10.1093/bioinformatics/btm061.
Yang, X., & Yu, X. 2009. An introduction to epitope prediction methods and software. Rev Med Virol, 19(2), pp. 77-96, doi:10.1002/rmv.602
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.