See also in:
Design of Experiments (DOE) is a very useful process improvement methodology. Microsoft Excel has some powerful data analysis tools which I have successfully used for DOE.
Factorial Experiment Example
We want to minimize process loss and after some brainstorming among the process specialists we concluded that 5 factors may affect process loss. Based on current factor levels we have selected the following levels to experiment.
Download this Excel file:
Full factorial designs are constructed simply counting in binary to obtain all the different combinations of 0 and 1 for all factors. In our designs we just replace 0 with -1 to meet the balance property.
1/2 Fraction Design
With our 5 factors to run a full factorial set of experiments we would need 2 ^ 5 = 32 experiments. This could involve time and money before we are sure that all factors really affect our process. Therefore we might start with a subset of the full factorial as a detection experiment. A 1/2 fraction will involve 1/2 the number of experiments: 16.
We can build a 1/2 fraction 5 factor design from a 4 factor full factorial:
We do this by replacing the 4 factor interaction (which is very unlikely to happen) with the 5th factor (Flow). We construct therefore the Flow column multiplying the other 4 columns.
Run the Experiments
Now with this design we would replace the coded values ( 1 and -1) with the high and low values we selected for each factor and run the experiments in our process.
In this example we are going to use a simulator to run our experiments:
In this simulator we run the experiments with the factors in coded form.
We have added 2 additional experiments with the values called central points (all zeros) which will correspond to the average value of the high and low levels in each factor.
We do this to make a more robust design without going all the way to the full 32 experiments.
The simulator has provided the process loss outputs for each experiment.
We want to be able to estimate the influence of each individual factor in the result but also the interactions among them so before we analyse the results we will build a column for each interaction among 2 factors. If we wanted to include 3 factor interactions we would also add them to this matrix.
You obtain each interaction column just by multiplying the corresponding factor columns. Just enter the formula in the first line and replicate all the way down.
Regression with Excel Data Analysis
Enter in Excel: Data/ Data Analysis/ Regression
And the result will be
We notice that R^2 = 0.99 which validates our mathematical model.
The Probability column (p) tells us which factors and interactions are significant: those below 0.05.
Factors Temp and Press have a p value above 0.05 but their interaction is below (0.02) therefore we must keep them.
We run the Regression again keeping only the factors and interactions not removed:
This gives us the formula for the estimation of Process Loss as a function of 3 factors (out of the original 5 we thought relevant):
Loss = 20.24 + 6.85 pH -1.4 Temp - 0.06 Pres + 4.41 Temp * Pres
This equation is with factors in coded values.
We now want to calculate with this mathematical model of our process which are the pH, Temp and Pres values which will minimize Process Loss.
In order to do this we will use Excel Solver.
We add cells N17:N21 to house the results we are looking for. N17 is the factor to multiply with the interception so it must be 1. Cell N21 should have de product Temp*Pres so we input the formula (=N19*N20).
So we will just ask Solver to give us the values for the yellow cells.
In N23 we put the formula to calculate loss with the coefficients in column I: (=SUMPRODUCT(I17:I21;N17:N21)
We now call Solver to minimize this value changing pH, Temp and Pres with the restrictions that they must be between -1 and +1.
The result is that the minimum loss of 7.64 can be obtained with pH = 2 (coded -1), Temp = 150 (coded 1) and Pres =1 (coded -1)
We would now run confirmation experiments to validate these optimal values. With our simulator:
We obtain a confidence interval for the process loss with our optimal factor values.
We can view a graphical representation of our mathematical model to help us understand the process behaviour:
Entering a value for pH we can see the Loss for each Press-Temp combination.
With the color code we clearly identify the minimum value in dark green: 7.6 which corresponds to Press = -1 and Temp = 1 as calculated by Solver.
Which in 3D would be:
In spite of the curvature of this shape the behaviour of each factor is linear. The twisting of the shape is due to the interaction between Pressure and Temperature.
Residue Analysis to Validate the Model
We want to estimate the error we make by replacing our process behaviour with this mathematical model.
We subtract the actual Loss values measured in our process from the estimates of our formula to obtain the residuals.
We see no trends of the residuals along the estimate range so this is OK.
Test for Normality of the Residuals
The next check is to make sure that the average of the residuals is close to 0 and they follow a normal distribution:
If the P value for this normality check was below 0.05 this would indicate lack of normality. Since it is 0.983 well above, it passes the test.
Test for Stability Along Time
We want to check that the process has not significantly changed during the time of our experiments:
No out of control alerts appear in this SPC chart of the residuals so no evidence of significant changes.
Notice that we have a timestamp next to each residual. That would be the timestamp of the moment each experiment was performed. It is essential to collect these timestamps to investigate any anomalies detected during the analysis.
- Design of Experiments (DOE) is a powerful methodology for process improvement
- It enables the identification of critical process factors based on data rather than impressions
- We can estimate the optimal values of these critical factors to optimize the process
- Excel provides some useful Data Analysis tools to achieve this
- Some processes outputs don't have a linear relation with the critical factors. In this case we will need a more sophisticated formula such as RSM: