Application of text-mining techniques for extraction and analysis of paracetamol and ibuprofen marketed products’ qualitative composition

  • Jelena Djuriš University of Belgrade – Faculty of Pharmacy, Department of Pharmaceutical Technology and Cosmetology
  • Jovana Pilović University of Belgrade – Faculty of Pharmacy, Department of Pharmaceutical Technology and Cosmetology
  • Marina Džunić University of Belgrade – Faculty of Pharmacy, Department of Pharmaceutical Technology and Cosmetology
  • Sandra Cvijić University of Belgrade – Faculty of Pharmacy, Department of Pharmaceutical Technology and Cosmetology
  • Svetlana Ibrić University of Belgrade – Faculty of Pharmacy, Department of Pharmaceutical Technology and Cosmetology
Keywords: text mining, dosage forms, qualitative analysis, excipients, paracetamol, ibuprofen


Text mining (TM) applications in the field of biomedicine are gaining great interest. TM tools can facilitate formulation development by analyzing textual information from patent databases, scientific articles, summary of products characteristics, etc. The aim of this study was to utilize TM tools to perform qualitative analysis of paracetamol (PAR) and ibuprofen (IBU) formulations, in terms of identifying and evaluating the presence of excipients specific to the active pharmaceutical ingredient (API) and/or dosage form. A total of 152 products were analyzed. Web-scraping was used to retrieve the data, and Python-based open-source software Orange 3.31.1 was used for TM and statistical analysis (ANOVA) of the obtained results. The majority of marketed products for both APIs were tablets. The predominant excipients in all tablet formulations were povidone, starch, microcrystalline cellulose and hypromellose. Povidone, stearic acid, potassium sorbate, maize starch and pregelatinized starch occurred more frequently in PAR tablets. On the other hand, titanium dioxide, lactose, shellac, sucrose and ammonium hydroxide were specific to IBU tablets. PAR oral suspensions more frequently contained dispersible cellulose; liquid sorbitol; methyl and propyl parahydroxybenzoate, glycerol and acesulfame potassium. Specific excipients in other PAR dosage forms, such as effervescent tablets, hard capsules, oral powders, solutions and suspensions, as well as IBU gels and soft capsules, were also evaluated.


1.          Zheng S, Dharssi S, Wu M, Li J, Lu Z. Text mining for drug discovery. In: Larson R, Oprea T, editors. Bioinformatics and Drug Discovery. Methods in Molecular Biology, vol. 1939. New York: Humana Press; 2019; p. 231-52.

2.          Tari LB, Patel JH. Systematic drug repurposing through text mining. In: Kumar V, Tipney H, editors. Biomedical Literature Mining. Methods in Molecular Biology, vol. 1159. New York: Humana Press; 2014; p. 253-67.

3.          Baker NC, Ekins S, Williams AJ, Tropsha A. A bibliometric review of drug repurposing. Drug Discov Today. 2018;23(3):661-72.

4.          Conceição SI, Couto FM. Text Mining for Building Biomedical Networks Using Cancer as a Case Study. Biomolecules. 2021;11(10):1430.

5.          Wang Y, Zhu S, Liu C, Deng H, Zhang Z. Text mining and hub gene network analysis of endometriosis. BioMed Res Int. 2021;2021:5517145.

6.          Nicholson DN, Greene CS. Constructing knowledge graphs and their biomedical applications. Comput Struct Biotechnol J. 2020;18:1414-28.

7.          Choi Y, Hong S. Qualitative and quantitative analysis of patent data in nanomedicine for bridging the gap between research activities and practical applications. World Patent Information. 2020;60:101943.

8.          Patel S, Patel MS, Patel AD, Shah UH, Patel MM, Solanki ND, Patel MJ. Probiotic Formulations: A Patent Landscaping Using the Text Mining Approach. Curr Microbiol. 2022;79(5):1-15.

9.          Tao D, Yang P, Feng H. Utilization of text mining as a big data analysis tool for food science and nutrition. Compr Rev Food Sci F. 2020;19(2):875-94.

10.       Raja R, Coelho A, Hemaiswarya S, Kumar P, Carvalho IS, Alagarsamy A. Applications of microalgal paste and powder as food and feed: An update using text mining tool. Beni-Suef Univ J Basic App Sci. 2018;7(4):740-7.

11.       Chai KH, Hang CC, Xie M, editors. A text mining-based recommendation system for customer decision making in online product customization. Proceedings of an IEEE International conference on management of innovation and technology; 2006 June 21-23; Singapore. New Jersey: IEEE, Piscataway; 2006. 473 p.

12.       Mining heterogeneous data for formulation design.  Proceedings of an IEEE International Conference on Data Mining Workshops; 2020 Nov 17-20; virtual conference. IEEE; 2020. 589 p.

13.       Medical Formulation Recognition (MFR) using Deep Feature Learning and One Class SVM. Proceedings of an  IEEE 2020 International Joint Conference on Neural Networks;  2020 July 19-24; Glasgow, United Kingdom. IEEE; 2020. 1 p.

14.       Rincón-López J, Almanza-Arjona YC, Riascos AP, Rojas-Aguirre Y.  Technological evolution of cyclodextrins in the pharmaceutical field. J Drug Deliv Sci Technol. 2021;61:102156.

15.       Rincón-López J, Almanza-Arjona YC, Riascos AP, Rojas-Aguirre Y. When cyclodextrins met data science: Unveiling their pharmaceutical applications through network science and text-mining. Pharmaceutics. 2021;13(8):1297.

16.       Yang M, Byrn SR, Clase KL. An analytic investigation of the drug formulation-based recalls in the USA: see more beyond the literal. AAPS PharmSciTech. 2020;21(5):1-10.

17.       Oral OTC Analgesics Market Share Trends [Internet]. [cited 2022 Aug 20]. Available from:

18.       Non-steroidal Anti-Inflammatory Drugs (NSAIDs) Market Research Report 2021-2028 [Internet]. [cited 2022 Aug 20]. Available from:

19.       Electronic Medicines Compendium (EMC) [Internet]. [cited 2022 Jul 10]. Available from:

20.       ParseHub - web scraping software [Internet]. [cited 2022 Jul 10]. Available from:

21.       Šimek M, Grünwaldová V, Kratochvíl B. Comparison of compression and material properties of differently shaped and sized paracetamols. KONA Powder Part J. 2017;34:197-206.

22.       Niazi SK. Handbook of Pharmaceutical Manufacturing Formulations: Volume One, Compressed Solid Products. Boca Raton: CRC Press, Taylor & Francis Group; 2019.

23.       Stojanovska Pecova M, Geskovski N, Petrushevski G, Chachorovska M, Krsteska L, Ugarkovic S, Makreski P. Solid-state interaction of ibuprofen with magnesium stearate and product characterization thereof. Drug Dev Ind Pharm. 2020;46(8):1308-17.

24.       Chaiya P, Chinpaisal C, Limmatvapirat S, Phaechamud T. Alteration of crystallinity and thermal properties from incompatibility between ibuprofen and boundary lubricants. Mater Today-Proc. 2021;47:3500-8.

25.       Rowe RC, Sheskey P, Quinn M. Handbook of Pharmaceutical Excipients. 6th edition. London: Pharmaceutical Press; 2009.

26.       Lodha A, Patel A, Chaudhuri J, Jadia P, Joshi T, Dalal J. Formulation and evaluation of transparent ibuprofen soft gelatin capsule. J Pharm Bioall Sci. 2012;4(1):S95.

27.       Mazurek-Wadołkowska E, Winnicka K, Czajkowska-Kośnik A, Czyzewska U, Miltyk W. Application of differential scanning calorimetry in evaluation of solid state interactions in tablets containing acetaminophen. Acta Pol Pharm. 2013;70(5):787-93.

28.       Blundell R, Butterworth P, Charlier A, Daurio D, Degenhardt M, Harris D, et al. The role of titanium dioxide (E171) and the requirements for replacement materials in oral solid dosage forms: An IQ Consortium Working Group Review. J Pharm Sci. 2022. doi: 10.1016/j.xphs.2022.08.011.

29.       Sodium laurilsulfate used as an excipient [Internet]. European Medicines Agency report [cited 2022 Aug 21]. Available from:

30.       Lakshmi PK, Kumar MK, Sridharan A, Bhaskaran S. Formulation and evaluation of ibuprofen topical gel: a novel approach for penetration enhancement. Int J Appl Pharm. 2011;3(3):25-30.

Original scientific paper