The New Language Corpus: Exploring Perception of Graphemes 'P' and 'B' in Serbian Cyrillic and Latin Scripts According to Semantical Context

  • Sara Tvrdišić Faculty of Dramatic Arts, University of Arts in Belgrade
Keywords: Serbian Cyrillic script, perception of graphemes, machine language learning


Language corpus serve as essential instruments in facilitating research endeavors within the domain of natural language processing. As an illustration of such a language corpus, the paper presents part of the conducted research. Notably, the Serbian language demonstrates a deficiency in its linguistic resources, a concern accentuated by its unique status as one of the few languages characterized by a bialphabetical system., i.e. although in official use only the Cyrillic alphabet is used, in the Serbian language the Latin alphabet is also used in parallel. The Serbian Cyrillic alphabet in modern times is threatened by various influences of globalization, i.e. more dominant use of the Latin alphabet in everyday life. The primary goal of the research is to examine the perceptual differences between the Latin and Cyrillic alphabets (for the purposes of the research, a sample of subjects whose native language is not Serbian was used, so that the subjects would not be compelled by previous knowledge of the semantics of the Serbian language - the subjects would have to know at least the basics of the Serbian language and the rules of reading and writing, to be able to understand the questionnaire, simplified with vocabulary for level A2). Within this paper, a segment of this research will be presented, through several compared variables, and predominantly by examining whether the graphemes 'P' and 'B' can cause confusion, because these graphemes in Serbian Cyrillic and Latin have different phonetic pronunciation. The research showed that the answers given by the subjects were in a higher percentage perceived in the reverse script of the questionnaire (if the questionnaire was in Cyrillic, the perception of unclear words by a higher percentage of subjects was in Latin and vice versa).


[1] The Ministry of Science, Technological Development and Innovation. [Online]. Available: [Accessed March. 1, 2024].
[2] P. Chen and V. Marian, "Bilingual spoken word recognition," in Speech Perception and Spoken Word Recognition, G. Gaskell and J. Mirković, Eds. London & New York: Routledge, 2017, pp. 143-163.
[3] L. B. Feldman and M. T. Turvey, "Word Recognition in Serbo-Croatian Is Phonologically Analytic," Journal of Ex-perimental Psychology: Human Perception and Performance, vol. 9, no. 2, pp. 288-298, 1983.
[4] R. Frost, L. B. Feldman, and L. Katz, "Phonological Ambiguity and Lexical Ambiguity: Effects on Visual and Audi-tory Word Recognition," Journal of Experimental Psychology: Learning, Memory and Cognition, vol. 16, no. 4, pp. 569-580, 1990.
[5] L. B. Feldman, "The contribution of morphology to word recognition," Psychological Research, vol. 53, no. 1, pp. 33-41, 1991. [Online]. Available:
[6] L. B. Feldman, D. Barac-Cikoja, and A. Kostić, "Semantic aspects of morphological processing: Transparency ef-fects in Serbian," Memory & Cognition, vol. 30, no. 4, pp. 629-636, 2002.
[7] S. Tvrdisic, “Bialphabetic Perception of the Serbian Language and Didactic Methods for Identifying Gifted Learners in the Context of Creative-linguistic Development”, In Proc. Working With the gifted: Methods and Programs, on The Sixth International Professional and Scientific Conference ’09&10, 2023, pp. 83-95. [Online]. Available: [Accessed February. 23, 2024].
[8] D. Vejnović and S. Zdravković, "Side flankers produce less crowding, but only for letters," Cognition, vol. 143, pp. 217-227, 2015. [Online]. Available:
[9] S. Grondin, "Psychology of Perception," Cham, Springer, 2016.