A MATHEMATICAL APPROXIMATION TO LEFT-SIDED TRUNCATED NORMAL DISTRIBUTION BASED ON HART’S MODEL

Left-sided truncated distributions (LSTD) have been found in different situations in the industry. For example, the life distribution of used devices is left-sided truncated distribution. Moreover, if a lower specification exists without the up per specification limit, the product distribution is truncated from the left side. Left-sided truncated normal distributions (LSTND) is a special case where the original distribution is normal. LSTND characteristics, as well as cumulative den sities and probabilities can be difficult to employ manually, with most practitioners relying largely on specialized (and expensive) software. In many cases, practitioners are against purchasing software, as they are often limited in the number of estimations. The paper will provide an accurate and straightforward approximation to the cumulative density of LSTND. Hart’s normal distribution is simplified and used as a foundation of this model. The maximum absolute error for the curve at different truncation points (i.e., Z L ) over the definition range (i.e., [zL: ∞]) is as follows: 0.004303 for Z L =-4, 0.00432 for Z L =-3, 0.00449 for Z L =-2, 0.005727 for Z L =-1, and 0.0106 for Z L =0. Even the maximum errors are very ignorable in probability applications. Further, it is rare to find a truncation point of higher than -2 in the industry.


INTRODUCTION
Truncated normal distributions are a crucial function in the field of statistics and probability. Specifically, the use of screening or thresholding outputs from a dataset from the left side of a normal distribution requires the Left-Sided Truncated Normal Distribution (LSTND) function. The examples from the industry on the use of the LSTND function are plentiful. In manufacturing, if the acceptable product performance must be higher than a certain value, for example, thin-film roughness of produced fuel cells must be greater than 0.01 mm, then the variations in acceptable data of roughness is truncated on the left side of the normal distribution. The level (0.01) in this case is called a lower specification limit (LSL). Donald Wheeler, in his book Understanding Industrial Experimentation, Second Ed. (SPC Press, 1992) introduced the specification limits. He said "The traditional approach to the problem of product variation has been that of specifications." [1] Further, manufacturers have hoped to use specification limits to narrow variations in a process's target performance. Such targets (say, target +/-Δy) place acceptable bounds to narrow the gaps in the product's performance. The product is considered satisfactory when the quality characteristics Y falls within the specificed limits. The product is considered unsatisfactory if the value for Y falls outside the limits. In this case, certain actions have to be taken to rectify the situation. Although these limits are manufactured boundaries, they are being used to make despotic decisions related to the performance and quality of the products. Put in another way, specification limits are credulity efforts that were created to deal with the products' characteristics variation challenges, and problems. In addition, these boundaries are being used to categorize the products into 'good/acceptable' or 'bad/unacceptable' products. Usually, in large-scale flexible manufacturing, nobody considers the distribution after screening the bad products as a left-sided truncated distribution, as the percentage of the screened products is very negligible. Furthermore, the percentage of the unacceptable parts in some cases reaches 3.4 ppm and indeed the methodology of six sigma is developed based on this percentage. However, the accuracy in the manufacturing mentioned above is not real for many products, as many of them are subjected to high variation. In the industry of solar cells, for example, the efficiency of the manufacturing cells varies a lot, with up to 10% of the cells removed for scrap material. In the manufacturing of jewelry, the product variation is high, as well. In general, a significant portion of hand made products are removed for rework or scrap. Besides the usage of truncated distribution in manufacturing and quality engineering [2], it is involved in many other applications, such as the following fields: lifetime studies (e.g., [3][4]), economics (e.g., [5][6][7][8]) and water resources (e.g., [9] [7] Use the recursive the moment formula to study the moments of truncated (below zero) normal distribution. Liquet and Nazarathy [12] used an ordinary differential equation (ODE) to study the moment that fits truncated univariate distributions. Hoffmann and Vetter [13] compared both the normal truncated empirical distribution and the function of L´evy to the Gaussian process, and they found a weak convergence between these two cases. Finally, Sakaguchi et al. [14], in studying textile applications, used a truncated distribution to estimate tsumugi width.

Left-sided Truncated Normal Distribution (LSTND)
Assuming the variations caused by dependent variables are distributed normally, the variations of good products (after trimming values below lower specification limit) depict the LSTND.
In this case, the normal density function increases to appropriately scale with the truncated area to the left, keeping the area under the density at 1, see Figure 1.
The normal distribution is arguably the most important as well as a popular probability distribution. There are two main reasons why this distribution has such an impact on all fields of probability and statistics (e.g., reliability, quality control, management, etc.), as follows: 1. The distribution of independent trials or quantities is normal. The measurement of errors is an example of these independent trials. The binomial is the discrete version of independent trails distribution. As the number of trials increases to infinity, the binomial distribution becomes closer and closer to the normal distribution. 2. The distribution of means of different samples taken from the same population is approximately normal, according to the Central Limit Theorem (CTL). The normal distribution has a symmetric bell shape about the center of the distribution, the mean. Besides the mean, the normal distribution is defined into another parameter, which is the variance. Variance measures the extent to which the squared deviation of a random variable is spread out from its mean. The probability density  In Equation (1), μ represents the mean, and σ represents the standard deviation. Equation (2) describes the standard normal distribution where the mean is equal to 0, and the standard deviation is equal to 1. Please note that ∅ (x) in Equation (2) is corresponding to f(x) in Equation (1).
Equation (3) Is a transformation formula and can be used to transform the normal distribution of any μ and σ to a standard normal distribution. It is used to find the corresponding z-score on the standard normal distribution to x value on the concerning normal distribution.
The cumulative density function, F(x), refers to the probability that the random variable, X is equal or less than an observed value, x, that is: (4) So we know several things: that F(x) is bounded below by 0, and bounded above by 1 (because it doesn't make sense to have a probability outside [0:1]) and that it has to be increasing (or at least, non-decreasing) with x. Equations (5) and (6) refer to the cumulative mass function of both the normal and standard normal distributions respectively.
In the standard normal distribution case, Ф represents the cumulative mass function. Note, a cumulative normal mass function cannot be solved manually, as it is highly complex. However, the statisticians developed a specialized table called z-table to be used manually. The standard normal distribution values for the cumulative mass function in the z tables are addressed. The transformation formula can handle the non-standard normal distribution with the z-table after finding the corresponding z-score. In the case of left-sided truncated distributions, the distribution function domain changes to be [x L : ∞]. Equations Equation (9) and Equation (10) illustrate the cumulative mass function of LSTNDs for both the normal and the standard normal distributions respectively.
Practitioners addressing cases with LSTNDs consistently use specialized software (e.g., R) or specially designed programs. There is an effort to tabulate the values of cumulative normal distribution function by Khasawneh et al. [2]. However, the resolution of the z-score of this table is high (0.1). This level is high if you compare it to the resolution of the original z-table (i.e., 0.001). Besides the resolution concern, the table is not popular and not accessible publically. Evaluating a normal distribution is much easier than evaluating an LSTND for three reasons: 1) There are statistical tables for the cumulative distribution function to a very fine resolution, and the table is popular and available to everybody, 2) The values of the cumulative distribution function are inherently defined in Excel spreadsheet, statistical software, programming languages, etc., and 3) Various works in mathematics such as Cadwell [15], Polya [16], Lin [17], Hart [18], Hoyt [19], Hamaker [20], Lin [21], Lin [22], Aludaat and Alodat [23]. Bowling et al.
[24] provide relatively accurate and straightforward mathematical approximations to the cumulative normal mass function. However, literature is limited regarding studies that demonstrate approximating the LSTNDs, with only two approximating models identified by this study. In sharp contrast, neither of these works provided a relatively accurate and straightforward model.

PROPOSED MODEL
This paper introduces two LSTND mathematical approximations that are both straightforward and accurate.
The approximation was developed based on Hart's two models of the normal distribution, approximations to the cumulative normal mass function [25][26]. In subsection Some notes on Hart's approximations, two introduced models by Hart are presented and further discussed in terms of the applicability, simplicity, and accuracy. Subsection The introduced model. introduces the approximation to the cumulative mass of LSTND.

Some notes on Hart's approximations
Two approximations were introduced by Hart to demonstrate the cumulative normal distribution function over the positive z-score domain. The first one introduced in 1957 and the second one introduced in 1966, as ad- dressed in Equation (11) and Equation (12), respectively. +bZ -e +aZ , z z π +bZ -P z+ P z +e +aZ Φ(z) = +bZ e +aZ --, z z π +bZ P z+ P z +e +aZ where Although both models are developed for the positive z-score region, we can estimate the approximation in the negative z-score using the fact, Ф(-z)=1-Ф(z). Also, we can write the equations in term of the cumulative normal distribution function, Ф(z) instead of its complement using the fact, Ф̅ (z)=1-Ф(z). Equation (13)  where At z=0, Ф(z) can be estimated using the upper or the lower part of Equation (13) or Equation (14), as the result of any is 0.5. Hart's (1) model is very simple while Hart's (2) model is more complicated. According to the criteria of simplicity, Hart's (1) model is recommended to be used for building our model in this paper. In order to decide which model to use, the accuracy of both models must be measured. The models' accuracy was assessed with a measure of the deviation between the model results from the true results over the entire z-score domain. We are specifically interested in the maximum deviation.     From the previous discussion, we conclude that no need to investigate the deviation over the regions, z>4 and z<-4. Figure 2 shows the deviation of the two models over the range [-4: 4]. The maximum absolute deviation (error) is 0.0043 for Hart (1) while it is 0.01282 for Hart (2) model. It is clearly noticeable that the accuracy of the Hart's (1) model is much higher, besides its simplicity.

The introduced model
In the previous subsection, we concluded that Hart (1) model is more simple and accurate than Hart (2) model. Therefore, the LSTND introduced in this paper is built based on thw Hart's (1) model. As a first step, we need to estimate the probability density function, ∅(z) of the Hart's (1) model. Specifically, we have to derive Equation (13) with respect to z-score, as stated in Equation (18). The result of derivation is addressed in Equation (19).
Substituting Equation (19) in Equation (8) allows us to find the probability density function ∅ T (z) of the LSTND The approximation to the cumulative mass funcation of LSTND can be estimated by integrating the equation 20 over the [z L : z], as expressed in Equation (21). The approximation is addressed in Equation (22).

INDUSTRY APPLICATIONS (QUALITY ENGINEERING)
The model presented in equation (22) can be applied to different situations in the industry in order to estimate the cumulative probability of truncated normal distribution and the associated statistics with this distribution. Out of all areas, we have selected quality engineering to explain the model application. The main aim of quality engineering is to achieve a good and affordable design and management control assurance of quality performance of an organization's products and processes. The selection of quality engineering is because all distribution after a screening of unfit products/parts is truncated distributions. In detail, we defined two values (i.e., LSL and USL) that refer to the critical values between which products or services should operate. Customer requirements usually set these critical values.
For example, the total power of x-ray used in security screening of luggage must be tighter to a certain value. In this case, the two-sided limits mean the target is in the middle between LSL and USL. In addition, the parameters with lower-is-better have only USL, these include ex- amples such as defects or defect densities, weight, delays in time, costs, and power consumption. The target in this situation could be a specified low value that would represent a value considered desirable for the customers, or it could be a general value such as half of the maximum specification limit. Similarly, a higher-is-better critical parameter will have the minimum specification limit LSL, including examples such as efficiency, dropsto-failure, mean-time-to-failure (MTTF), device life, and resolution. Our model is valid to be applied to the third case where only a minimum specification limit exists. Assume for example, if the purity of 0.018 mm thickness of aluminum foil follows a normal distribution. The purity is a critical parameter that affects the foil ductility. The acceptable purity limit for this thickness is 0.99 (i.e., the lower specification limit). Assume a quality engineer looks to estimate the chance to have a purer foil than 0.993 after discarding the unfit foils, with the following parameters: μ=0.992 and =0.0005. The first step is to use the transformation formula to determine z L , and z, as follows: z L =(0.99-0.992)/0.0005=-4 and z=(0.993-0.99)/0.0005=2. By using equation (22), the problem requirement can be estimated as follows: 1-Φ T =-3 (2)=0.022883. This value nearly approximates the true value (i.e., 0.022751), and the deviation of the model result from the true value is only 0.00013. In the next section, we will discuss the model accuracy in terms of results deviation from the true results with z and z L .

ANALYSIS OF ACCURACY
In this section, details on model error level with z and z L are discussed. We defined the model error as the deviation of the model results from the true results. Figure 3 represents the 3-D surface response of error with two factors, z and z L . It is noteworthy that the current study's model error tends to be a minimum and moving far from the mean. For example, the model error is almost zero at z equals -5 and z=5 regardless of the truncation point's value. For more clarification, Figure 4 presents the model error versus z-score over the domain [z L : 4] for four different truncation points (i.e., z L = -4, -3, -2, -1, and 0). We can clearly notice that there is a peak in the positive z-score range and another peak in the negative z-score. For z L =0, the peak in the negative z-score region is truncated with the truncation region. The maximum absolute error is depicted by the peak for that truncation point. The maximum deviation for at different zL are as follows: for z L =-4 is 0.004303, for z L =-3 is 0.00432, for z L =-2 is 0.00449, for z L =-1 is 0.00573, and for z L =-0 is 0.0106. Overall, the maximum deviation increases with z L . For example, the maximum absolute error at z L =0 is about 4 folds of the maximum absolute error at z L =-4. In this paper, the maximum truncation point is z L =0, as it is rare to find an application with a truncated area of more than half of the original area. Even though, the error at z L =0 is still very ignorable for most probability and statistics ap-plications including quality engineering. The accuracy of the logistic-based approximation (similar purpose model) is over 0.02 and the author claims that nobody can feel the difference in most of the industrial applications. Figure 5 represents the relationship between the maximum absolute error and z L . The increase becomes rapid as the z L increases, or in another word, the curve is concave up.

CONCLUSION
In many situations, quality engineers are forced to deal with the LSTND. Handling these kinds of distributions cannot be done manually and engineers usually use specified programs or software. But, using these programs/software is not practical in all cases. In many cases, an engineer needs to do some of his/her calculations based on this distribution for one of few times. This paper introduces a mathematical approximation that allows engineers to manually calculate the estimates of left-sided truncated normal and standard normal distribution using the probability density distribution and the cumulative normal distribution. The model is relatively straightforward and able used with a simple hand calculator, however, the study recommends using an excel spreadsheet to better account for clerical error.