THE RULE BASED CLASSIFICATION MODELS FOR MHC BINDING PREDICTION AND IDENTIFICATION OF THE MOST RELEVANT PHYSICOCHEMICAL PROPERTIES FOR THE INDIVIDUAL ALLELE

Davorka Jandrlić

doi:10.5937/univtho6-10768

Davorka Jandrlić Faculty of mechanical engineering

DOI: https://doi.org/10.5937/univtho6-10768

Keywords: MHC - peptide binding, The rule based classification, K – mean clustering,

Abstract

Binding of proteolyzed fragments of proteins to MHC molecules is essential and the most selective step that determines T-cell epitopes. Therefore, the prediction of MHC-peptide binding is principal for anticipating potential T cell epitopes and is of immense relevance in vaccine design. Despite numerous methods for predicting MHC binding ligands, there still exist limitations that affect the reliability of a prevailing number of methods. Certain important methods based on physicochemical properties have very low reported accuracy. The aim of this paper is to present a new approach of extracting the most important physicochemical properties that influence the classification of MHC-binding ligands. In this study, we have developed rule based classification models which take into account the physicochemical properties of amino acids and their frequencies. The models use k-means clustering technique for extracting the relevant physicochemical properties. The results of the study indicate that the physicochemical properties of amino acids contribute significantly to the peptide-binding and that the different alleles are characterized by a different set of the physicochemical properties.

Author Biography

Davorka Jandrlić, Faculty of mechanical engineering

Department of mathematics

References

Brusic, V., Bajic, V.B., & Petrovsky, N. 2004. Computational methods for prediction of T-cell epitopes: A framework for modelling, testing, and applications. Methods, 34(4), pp. 436-43, pmid:15542369.

Hartigan, J.A. 1975. Clustering Algorithms.New York, NY: USA John Wiley & Sons..

Heckerman, D., Kadie, C., & Listgarten, J. 2007. Leveraging information across HLA alleles/supertypes improves epitope prediction. J Comput Biol, 14(6), pp. 736-746. doi:10.1089/cmb.2007.R013.

Jandrlić, R.D., Lazić, M.G., Mitić, S.N., & Pavlović, D.M. 2016. Software tools for simultaneous data visualization and T cell epitopes and disorder prediction in proteins. Journal of Biomedical Informatics, . doi:10.1016/j.jbi.2016.01.016.

Joachims, T. 1997. A probabilistic analysis of the rocchio algorithm with tfidf for text categorization. . In: InInternational Conference on Machine Learning, ICML.

Joachims, T. 2005. Text categorization with Support Vector Machines: Learning with many relevant features. Sprnger.Lecture Notes in Computer Science, 1398, pp. 137-142.

Luo, H., Ye, H., Ng, H.W., Shi, L., Tong, W., Mendrick, D.L., & Hong, H. 2015. Machine Learning Methods for Predicting HLA-Peptide Binding Activity. Bioinform Biol Insights, 9(3), pp. 21-29, doi:10.4137/BBI.S29466.

Mitić, S.N., Pavlović, D.M., & Jandrlić, R.D. 2014. Epitope distribution in ordered and disordered protein regions: Part A. T-cell epitope frequency, affinity and hydropathy. J Immunol Methods, 406, pp. 83-103, doi:10.1016/j.jim.2014.02.012.

Martineau Justin, , & Finin Tim, 2009. Delta TFIDF: An Improved Feature Space for Sentiment Analysis. . In: Proceedings of the Third AAAI Internatonal Conference on Weblogs and Social Media. San Jose, CA: AAAI Press. May.

Nielsen, M., Lundegaard, C., Blicher, T., Lamberth, K., Harndahl, M., Justesen, S., & et al., 2007. NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and -B locus protein of known sequence.PLoS One, 2(8). doi:10.1371/journal.pone.0000796.

Nielsen, M., Lundegaard, C., Blicher, T., Peters, B., Sette, A., Justesen, S., & et al., 2008. Quantitative predictions of peptide binding to any HLA-DR molecule of known sequence: NetMHCIIpan. PLoS Comput Biol, 4(7), p. 1000107, doi:10.1371/journal.pcbi.1000107.

Pavlović, D.M., Jandrlić, R.D., & Mitić, S.N. 2014. Epitope distribution in ordered and disordered protein regions. Part B: Ordered regions and disordered binding sites are targets of T- and B-cell immunity. J Immunol Methods, 407, pp. 90-107, doi:10.1016/j.jim.2014.03.027.

Pingping Guan, I.A.D., Christianna Zygouri, , & Flower, D.R. 2003. MHCPred: A server for quantitative prediction of peptide-MHC binding., pp. 3621-3624.

Rousseeuw, P.J. 1986. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, . doi:10.1016/0377-0427(87)90125-7.

Roy, K., Kar, S., & Das, R.N. 2015. A Primer on QSAR/QSPR Modeling. Retrieved from http://www.springer.com/978-3-319-17280-4

Sidney, J., Southwood, S., Mann, D.L., Fernandez-Vina, M.A., Neuman, M.J., & Sette, A. 2001. Majority of peptides binding HLA-A*0201 with high affinity crossreact with other A2-supertype molecules. Hum Immunol, 62, pp. 1200-1216.

Tian, F., Yang, L., Lv, F., Yang, Q., & Zhou, P. 2009. In silico quantitative prediction of peptides binding affinity to human MHC molecule: An intuitive quantitative structure-activity relationship approach. Amino Acids, 36(3), pp. 535-554, doi:10.1007/s00726-008-0116-8.

Tung, C.W., & Ho, S.Y. 2007. POPI: predicting immunogenicity of MHC class I binding peptides by mining informative physicochemical properties. Bioinformatics, 23(8), pp. 942-949, doi:10.1093/bioinformatics/btm061.

Yang, X., & Yu, X. 2009. An introduction to epitope prediction methods and software. Rev Med Virol, 19(2), pp. 77-96, doi:10.1002/rmv.602