The New Language Corpus: Exploring Perception of Graphemes 'P' and 'B' in Serbian Cyrillic and Latin Scripts According to Semantical Context

  • Sara Tvrdišić Faculty of Dramatic Arts, University of Arts in Belgrade
Keywords: Serbian Cyrillic script, perception of graphemes, machine language learning


Language corpus serve as essential instruments in facilitating research endeavors within the domain of natural language processing. As an illustration of such a language corpus, the paper presents part of the conducted research. Notably, the Serbian language demonstrates a deficiency in its linguistic resources, a concern accentuated by its unique status as one of the few languages characterized by a bialphabetical system., i.e. although in official use only the Cyrillic alphabet is used, in the Serbian language the Latin alphabet is also used in parallel. The Serbian Cyrillic alphabet in modern times is threatened by various influences of globalization, i.e. more dominant use of the Latin alphabet in everyday life. The primary goal of the research is to examine the perceptual differences between the Latin and Cyrillic alphabets (for the purposes of the research, a sample of subjects whose native language is not Serbian was used, so that the subjects would not be compelled by previous knowledge of the semantics of the Serbian language - the subjects would have to know at least the basics of the Serbian language and the rules of reading and writing, to be able to understand the questionnaire, simplified with vocabulary for level A2). Within this paper, a segment of this research will be presented, through several compared variables, and predominantly by examining whether the graphemes 'P' and 'B' can cause confusion, because these graphemes in Serbian Cyrillic and Latin have different phonetic pronunciation. The research showed that the answers given by the subjects were in a higher percentage perceived in the reverse script of the questionnaire (if the questionnaire was in Cyrillic, the perception of unclear words by a higher percentage of subjects was in Latin and vice versa).


