Design of Experiments is a useful methodology for process improvement. The purpose is to find a relationship between process variables and key process outputs.
I find it useful to use Excel statistical analysis tools, Solver, Pivot charts, etc. to plan and analyse the results of these experiments.
The first approach is to look for a linear relationship as shown in:
But in some cases this relationship may not be linear, in this case we will try a quadratic model with Response Surface Analysis.
We will use an example in this Excel file you can download:
In this example we are trying to maximise process yield acting on the critical factors pH, Temperature and Time.
We will run the experiments in the Experiments simulation sheet using coded values.
We start by running a full factorial experiment with 2 central points:
We now compute the 2 way interactions by multiplying the corresponding columns:
We now use Excel Data Analysis/ Regression:
This is a confirmation that this linear mathematical model is not valid: R^2 adjusted is very low and all p values for all factors and interactions are above 0.05.
We suspect curvature so we look to the average yield values for the pH inputs with a Pivot Chart:
We can, clearly, see the lack of linearity: Yield for pH = 0 is totally out of line with the -1 and +1 values.
Response Surface Designs
In order to approximate a quadratic model we need two additional values for each factor. In our case of 3 factors we have 2 possible designs:
Both CCD and CCI designs involve the same number of experiments (20) but in the case of CCD the two additional values go beyond the values of 1 and -1 we originally wanted to experiment.
In the following table we can see the correspondence between coded and uncoded values for each factor:
We can not run CCD because the resulting pH values are physically impossible (pH varies between 0 and 14)
Therefore we will run a CCI design for 3 factors.
Response Surface CCI Experiments
By loading the coded CCI design in our simulator we obtain:
Analysis with Solver
We are going to use two alternatives for the approximation of a quadratic formula to our data: Solver and Regression.
We start with the Solver alternative:
The purpose of the analysis is to find a quadratic mathematical formula to approximate the behaviour of our process.
This formula will have as terms an independent value, the 3 factors each with a coefficient, 2 way interactions and the squares of each factor with their coefficients as shown from J2 to R2.
To use Solver we start by defining where we want Solver to store our results (the coefficients): they will be the yellow cells I3 to R3.
Assuming we already have the results we estimate yield for each experiment in column F with a formula in F3 which we will replicate all the way down. In this formula we use the corresponding factor values in columns B C and D with the coefficients in I3 to R3.
In column G we enter =(actual yield - estimated yield) ^ 2
In G23 we have the sum of the G column.
Now we ask Solver to calculate the values from I3 to R3 to make G23 a minimum.
Then we obtain the coefficients we were looking for in I3 to R3.
Optimization with Solver
Now that we have the formula to estimate Yield from pH, Temperature and Time we want to find the values that will give the maximum yield:
We will again use Solver this time to find these optimal values. Now we will use the coefficients in I3:R3 we just calculated and the values in I13 to R13. The only values we are asking Solver to calculate are J13 to L13. All the other values for interactions and squares are calculated from these with a formula in the corresponding cell. In I13 we put 1 that will be multiplied by the independent term in I3.
In K15 we put the formula for yield which is just the SUMPRODUCT of the coefficients in I3:R3 and the values in I13:R13
We ask Solver the maximize Yield (K15) with the values of our 3 factors in J13:L13. With the constraints that they should be between -1 and +1.
We obtain the optimal coded values of pH = 0.26, Temperature = 0 and Time = 0.2 and this will give us a Yield = 22.92
Analysis with Regression
Another Excel alternative is to approximate the quadratic formula using regression.
In order to do this we need to build the table of interactions and squares just by multiplying the corresponding columns of the factors.
We can now go to Analysis/ Regression:
Now we can see excellent R^2 values and see which factors and interactions are significant (p < 0.05).
Removing non significant terms:
We obtain the coefficients similar to those we calculated with solver.
Optimize Yield with Solver
We can again use Solver to maximize yield:
We have a more simplified formula because Temperature was not significant:
Yield = 20.31 + 15.05 pH + 6.68 Time - 24.81 pH^2 -12.17 pH * Time - 9.11 Time^2
This formula is for coded factor values.
Response Surface Graph
Let's have a look at the shape of this formula:
Now we can understand why the linear model was unable to find the optimal yield: we were trying to represent this with a flat surface experimenting only with +1 and -1.
Another way to see it:
We are assuming that this mathematical formula represents our process but there are some additional checks we should do to validate our assumptions.
In order to do this we are going to analyse the differences between the actual results of our experiments and the estimated values using our formula:
Comparing the residuals with the estimated values we should not detect any trends.
We expect the residuals to follow a normal (Gaussian) distribution with an average of zero:
The test for normality gives us a p = 0.61 > 0.05 so it is OK (no evidence of lack of normality)
Test for Trends in the Residuals
No trends have been detected along the time the experiments were run (see the timestamps of each experiment)
Now we run confirmation runs with our calculated optimal values for pH and Time and we obtain an average yield of 22.17 with a confidence interval 21.03 - 23.31
- Design of Experiments can be used to approximate linear or quadratic models to our process response in order to understand its behaviour
- It allows the identification of critical factors affecting our outputs
- We can then find the values of these factors to optimize these outputs
- All this analysis can be performed with Excel Statistical Analysis tools, Solver, Pivot Charts, etc.