Data mining techniques applied in the analysis of historical data

  • Jovana Kovačević Hemofarm AD, Product development
  • Aleksandar Kovačević University of Novi Sad – Faculty of Technical Sciences, Department of Computing and Control Engineering
  • Tijana Miletić Hemofarm AD, Product development
  • Jelena Djuriš University of Belgrade – Faculty of Pharmacy, Department of Pharmaceutical Technology and Cosmetology
  • Svetlana Ibrić
Keywords: drug manufacturing, gastro-resistant pellets, modelling, release profile, acid-resistance


Understanding the effect of the characteristics of formulation and process parameters on the physicochemical properties of a pharmaceutical product is very significant for the development of solid dosage forms, as the knowledge gained on small scale batches in the early phase of development is used in the later phases of product lifecycle or in the development of other products. One of the approaches for gaining a better understanding of the effects of the formulation and production process on the quality of the finished product is to apply a systematical approach which concerns performing experiments according to a predefined factorial or fractional factorial experimental plan. However, often it is the case that there are available data gathered in a non-systematic way, because experiments were not performed according to a predetermined experimental plan. In such a case, data mining techniques could be used to extract useful data from the historical data set. In this research, the possibility of using several data mining techniques to build models that describe the effect of formulation characteristics on acid resistance and dissolution profile of a model drug from gastro-resistant pellets. The model drug used in the research is duloxetine hydrochloride from the group of antidepressants. It belongs to the BCS 2 class of active pharmaceutical ingredients, and it is therefore necessary for the release profile of duloxetine to be characterized by a higher number of tested time points. The developed models can be used for planning future laboratory trials, or in the development of other products.


1.     Maimon O, Rokach L. Introduction to knowledge discovery and data mining. In: Maimon O, Rokach L, editors. Knowledge discovery and data mining handbook. 2nd edition. New York, USA: Springer; 2010; p. 1-13.

2.      Chansanroj K, Petrovic J, Ibric S, Betz G. Drug release control and system understanding of sucrose esters matrix tablets by artificial neural networks. Eur J Pharm Sci. 2011;44:321–331.

3.      Ibrić S, Jovanović M, Djurić Z, Parojčić J, Solomun L. The application of generalized regression neural network in the modeling and optimization of aspirin extended release tablets with Eudragit RS PO as matrix substance. J Control Release. 2002;82:213–222.

4.      Mihajlović T, Ibrić S, Mladenović A. Application of Design of Experiments and Multilayer Perceptron Neural Network in Optimization of the Spray-Drying Process. Dry Technol. 2011;29(14),1638-1647.

5.      Mansa RF, Bridson RH, Greenwood RW, Barker H, Seville JPK. Using intelligent software to predict the effects of formulation and processing parameters on roller compaction. Powder Technol. 2008;181:217-225.

6.      Ronowicz J, Thommes M, Kleinebudde P, Krysinski J. A data mining approach to optimize pellets manufacturing process based on a decision tree algorithm. Eur J Pharm Sci. 2015;73:44–48.

7.      Mendyk A, Kleinebudde P, Thommes M, Yoo A, Szleka J, Jachowicz, R. Analysis of pellets with use of artificial neural networks. Eur J Pharm Sci. 2010;41:421–429.

8.      Parojčić J, Ibrić S, Đurić Z, Jovanović M, Corrigan OI. An investigation into the usefulness of generalized regression neural network analysis in the development of level A in vitro–in vivo correlation. Eur J Pharm Sci. 2007;30:264-272.

9.      Ibrić S, Jovanović M, Đurić Z, Parojčić J, Solomun Lj, Lučić B. Generalized regression neural networks in prediction of drug stability. J Pharm Pharmacol. 2007;59­:745-750.

10.  Fayyad U, Piatetsky-Shapiro G, Smyth P. From data mining to knowledge discovery in databases. AI Mag. 1996;17(3):37-54.

11.  Sondwale PP. Overview of Predictive and Descriptive Data Mining Techniques. Int J Adv Res Comput Sci Softw Eng. 2015;5:263-265.

12.  Jansen PJ, Oren PL, Kemp CA, Maple SR, Baertschi SW. Characterization of impurities formed by interaction of Duloxetine HCl with enteric polymers hydroxypropyl methylcellulose acetate succinate and hydroxypropyl methylcellulose phthalate. J Pharm Sci. 1998;87(1):81-85.

13.  FDA Guidance for industry “Dissolution testing of immediate release solid oral dosage forms”. US Department of Health and Human Services, CDER. 1997.

14.  Petrović J, Ibrić S, Betz G, Đurić Z. Optimization of matrix tablets controlled drug release using Elman dynamic neural networks and decision trees. Int J Pharm. 2012;428:57– 67.

Original scientific paper