Statistical Process Control is used to detect when a significant change has taken place in a process.
All processes have variation as observed in any of their metrics but not all variation is significant from the statistical point of view.
When we want to control a process by adjusting some process parameter we can make two mistakes:
- Over-react by adjusting when we shouldn't
- Under-react by failing to adjust when we should
A case of over-reaction is illustrated by the following example:
Someone is shooting at a target and based on the deviation of the impacts he adjusts the gun site after each shot.
The end result will be an increase of the dispersion of the impacts, therefore the adjustments will make the process worse.
The correct way is, of course, to fire 5 or 6 shots without adjustments and then decide if adjustment is required based on the center of the impacts.
This is an example of under-reaction:
If you drop a live frog in boiling water it will immediately jump out to save its life.
But if you put it in a pot of cold water and heat it the frog will eventually pass out without any reaction.
Many companies have fallen into this trap:
- A big disaster generates a quick and effective reaction and the company recovers
- A slow degradation of their KPIs such as customer sat pass undetected until it is too late
Statistical Process Control
To avoid these two mistakes we can use SPC to control a metric to find out if the observed changes are SIGNIFICANT from the statistical point of view.
We will illustrate this with the use of an example: control your own weight to check if your diet is leading to your weight target or not.
You can download this example Excel file and replace the data in columns A and B with your own data:
You can decide the frequency of your data collection. In this example I have collected it daily.
It is important to collect it at the same moment (more or less) each day.
You can either have your Excel file in your smartphone or in the cloud (Google Drive or Microsoft's One Drive)
I am assuming our target is to reduce weight, therefore a downward trend will put the numbers in green and an upward trend will put them in red.
SPC uses some rules, developed by the Western Electric company, to detect symptoms of SIGNIFICANT trends . They are shown below with a screen of Minitab SPC for individuals:
The center line is the average of all values and the standard deviation is also estimated by all the values.
In our Excel file upward trend symptoms are in red and downward in green.
The Excel file applies rules 1, 2, 3, 5, and 6 of this list.
Weight data interpretation
On 2/4/2018 weight 78.8 is in red and we see the explanation with a number 2 in column N which means that two values (1/4/2018 and 2/4/2018) with 78.8 Kg are 2 σ above the average. This is passed data. What really interests us is the data we have just collected.
The numbers start getting green on 8/4 with 78 kg and the reason is that out of the 5 last numbers the last 4 are 1 σ below the average. This is interpreted as a SIGNIFICANT downward trend. This trend continues the following days.
Statistically significant trend doesn't mean that this reduction is acceptable but at least we are moving in the right direction.
Results analysis with Minitab
In our Excel file there is no graphic: we just want to alert of significant trends
Looking at this same data in Minitab we can see an SPC chart:
The red dots indicate trends. looking at the graph we notice the first two are upward and the last three downward so the overall trend is downward.
There is a common misconception about the meaning of process stability
For instance some might call the process characterized by the following data unstable:
But if we look at our SPC:
It doesn't give us any alert of unstability
This is a STABLE process consisting simply of throwing dice. Since we have always thrown dice the same way SPC is telling us that this is a stable process: no SIGNIFICANT change has been detected.
Stability is not always a good thing. In the case of our weight control stability would mean that there is no improvement.
We want stability when the process is OK. If we improve this will show with SPC alerts in the right direction.
Use for other processes
This Excel file can be used to control a variable other than weight at work. For instance:
In this case we are doing an hourly control of our process yield. In this case Yield increase is good so Red is Good.
At 20:00 we have enough evidence of a Yield increase although the change seems to have taken place at 16:00.
SPC enables the right detection of process change.
High process intrinsic variation can lead to over-reaction (adjust when you shouldn't) or under-reaction (not reacting when you should)
In spite of the high-tech of our shop floors you can still see pencil and paper data collection being used.
Data collection in the shop floor is a non-value-add operation which is still required to control the process.
This collection is often automatically done by the manufacturing equipment with no operator intervention.
Unfortunately, detecting aspect defects is often difficult to automate even with AOI (Automatic Optical Inspection).
When an operator is required to inspect we need a method to enter the results into the system.
Line incidents, change of critical process parameters, etc. also need to be reported.
The method still widely used in this 21st century is:
- The operator hand writes the defect data on a form
- Forms are collected at the end of the shift
- A clerk types the forms contents into the system at a later stage (next day or later)
- The engineer analyses the results and takes corrective actions
This reporting process creates a number of problems:
- It is time consuming for the operator
- It adds a non-value-add operation: transcribing the data
- The operator’s handwriting is not necessarily good (this criteria is not used for operator selection)
- Reporting errors may be introduced both by the operator and the clerk doing the transcription
- Some of these errors may be impossible to recover when they are detected on transcription
- Real time process control is impossible: the engineer detects a problem at least 24h after it happened.
- Maybe they have been producing bad product all this time.
The alternative to the pencil & paper data collection is, of course, directly reporting defects in a terminal the moment they happen.
You can provide terminals in the inspection and test stations to do that.
Sometimes it is difficult to provide individual terminals to all the operators who need to report. Sharing terminals and having to walk to the nearest terminal is time consuming so it may not be acceptable.
The reporting operation should not be time consuming and error prone. If we ask operators to type 12 digit numbers or time stamps the chances of error are very high.
So we need a reporting system which is both simple and error free.
Large number typing can be replaced by a bar code reader whenever possible. We can also use the data validation alternatives provided by Excel:
In this way the reporting errors may be detected the moment they are entered and corrected on the spot.
Pull down menus may be used for each field to avoid typing and insure all use the same wording.
The current time stamp can be automatically provided with a formula: =now()
Reporting on the cloud
Having a terminal for each operator who needs reporting may not be possible, in which case reporting on the cloud may be an alternative.
The data base can be created in a Google Drive spreadsheet:
Reporting may be done with any tablet or smart phone via WIFI
BYOD (Bring Your Own Device) alternative
Operators may have difficulties with readable hand writing or reporting in a terminal but they are normally experts in typing into their own smart phones. This existing skill can prove to be very useful for real-time on-line reporting.
The database may be in Internet or the company Intranet.
The smart phones used may be the operator’s own device or one supplied by the company.
It, obviously, requires appropriate firewalls to avoid entering virus.
Even a smart phone with no telephone line may be used to access via WIFI (an old out-of-use smart phone might do the job).
To avoid corrupting the database each workplace may have a dedicated input worksheet in the cloud and then data is automatically transferred to a central worksheet. Operators need write access to their input worksheet and only read access to the central worksheet and reports.
Real-time feedback to the workplace
Real-time data collection enables real-time feedback to the operators in order to take corrective actions while it is still possible.
This Andon display provides real-time feedback:
- PTH component insertion DPMO (red means above 1500 target)
- PTH component solder DPMO (green: below 190 target)
- Processed cards/hour.
SPC (Statistical Process Control) Charts is another alternative to provide feedback to the operators in real-time.
When this feedback is in real-time it is possible to relate changes in the output to what just happened in the line: change in process parameters, line incidents, etc.
This makes it possible to get to the root cause of problems and often solve them on the spot.
Pencil and paper data collection makes it impossible to provide this real-time feedback: all we have is after-the-fact compound data. In this case it will be difficult to find the root cause of the problem and therefore the problem will often remain unresolved. All we can do in this situation is guess what happened and try to find a justification for management.
With this simple value stream to produce products A and B we want to illustrate the way to maximize profit taking into account the system constraints.
· Each product is made by assembling two subassemblies, one of them common to both products.
· We have three special purpose workstations with a corresponding operator trained for the specific job.
· The line operates one shift with 2400 productive minutes per week.
· The fixed costs, which include the salaries of the three operators, are 8000 € per week.
· Product A is sold for 90€ and B for 100€.
· The maximum weekly market demand is 200 units for A and 100 for B
· Total direct costs (materials) for product A is 50€ and for B 40€ as shown above.
· We want to know how many products A and how many B we should produce to maximize profit.
First of all let’s look at the system constraints:
- Time available per week for each of the three operators is 2400 minutes
- Weekly market constraints: 200 A and 100 B.
We want to maximize overall profit so we calculate the margin of each product:
Since we can’t satisfy all the market demand we must decide which product should be our priority. If we ask Finance they will tell us B is the best, since it has the highest margin.
Let’s see how many resources we will need to produce all the market demand for B and how many A’s we will be able to produce with the remaining resources:
With the remaining manufacturing capacity we would be able to produce 140 A’s and assemble 380 A’s but we can only test 40 A’s. It makes no sense to manufacture more A’s than we can test. The excess assembly capacity will be wasted because we can only assemble the units which have been tested. This means that in fact we will only be able to produce 100 B’s and 40 A’s. This gives us a total loss per week of 400€.
Maybe the Finance criteria of giving priority to the product with the highest margin was not so good after all.
We have already noticed that the bottleneck in this line is Test: it is limiting the effective capacity of the whole line.
The use of bottleneck time is therefore a critical factor we should consider in optimizing the line.
We need to ask ourselves how much bottleneck time is consumed to obtain a 40€ margin in product A, and the same for B.
This means that each minute of test time (bottleneck) if used to test product A gives us a margin of 4€ while B only gives us 3€.
So when we take into consideration the existing constraints A is more profitable. Let’s give priority to A and use the remaining capacity to produce B:
So the bottleneck plays an important role in finding the optimal solution: it defines the effective capacity of the whole line, therefore we need to optimize the use of bottleneck time.
We can clearly see that due to the existing constraints this line is very inefficient. Let’s look at the capacity utilization of each operation:
The only operation with 100% utilization is test. This is what we would expect since it is the bottleneck. We see that final assembly has very low utilization (less than 50%).
We can see that the only way to improve further the productivity is to reduce test time by making it more efficient or to transfer some work (if possible) to the assembly operation.
Let’s now look at the problem graphically:
The red area is the forbidden area due to the constraints: maximum A, maximum B and the constraint of the 2400 minutes available at the bottleneck (test): each A consumes 10 minutes of test and each B 20 minutes.
We now look at the point of maximum profit: It will be the furthest point from the origin within the allowed area (green) which is the point of maximum A (200 units) and 20 units of B.
Solution with Solver
Now let’s see how to solve the problem with Microsoft Excel Solver. First we need to set all the data in a spread sheet:
Download file: Solver1.xlsx
In matrix B3:D4 we have the unit times for each operation in products A and B.
Column E has material costs and F product selling price. In column G (margin) we need a formula which just subtracts F – E.
In column H (Produced) we want Solver to place the results we are looking for: number of A’s and B’s to produce.
Column I has the market constraints for A and B.
Row 5 has the formulas above to calculate time consumed on each operation based on the products produced. These values we will compare with row 6 (available times for each operation)
Finally B10 has the shown formula to calculate the profit. This is the cell we will ask Solver to maximize.
Now we go to Solver to setup the parameters as shown:
We want Profit (B10) to maximize changing cells H3:H4 where Solver will place the solution we are looking for.
Now the constraints:
- B5:D5 ≤ B6:D6 Used times ≤ Available times
- H3:H4 ≤ I3:I4 Don’t produce above market demand
- H3:H4 Must be integers and not negative
Press resolve and we will get the results:
Now imagine that an engineer has found a way to reduce test time of product A from a total of 10 minutes to 9 minutes by adding some work to the assembly operation which would involve an increase from the previous time of 5 minutes to 7.
This suggestion would be difficult to justify with Finance, since we are increasing the overall operator time of product A one minute.
But let us look at our Excel to see what Solver has to say:
We have been able to produce 10 more B’s increasing the profit from 1200€ to 1800€.
This is telling us that maybe the way we compute product profit may not help when we have idle time in some of the operations.
Container distribution example
Now let’s see another example of Theory of Constraints solution with Solver:
In this case we want to minimize the cost of distribution of goods containers from warehouses X, Y, Z to shops A, B, C, D, E, F:
Download file: Solver2.xlsx
Matrix C4:H6 contains the transport costs of one container from each warehouse to each shop.
Matrix C9:H11 will contain the number of containers from each warehouse to each shop we are looking for.
The blue cells have the formulas shown.
The Solver parameters will be:
Notice we must meet all the shops requirements: C12:H12 = C13:H13
And the resulting solution:
This is the optimal solution which meets all the constraints.
- All Value Streams are subject to multiple constraints such as:
- Market demand
- The optimal operating point very much depends on the existing constraints which may vary along time.
- Microsoft Excel Solver is useful to optimize a Value Stream which is subject to multiple constraints.
Optimize both Yield and Cost with Minitab Design of Experiments
Design of Experiments is a powerful technique to optimize process outputs in complex processes where multiple factors affect the outputs. When we want to optimize more than one output the problem becomes more complex.
With this example we want to follow the steps, using Minitab, to optimize both process yield and cost by acting on 3 critical factors which have been identified by previous experiments.
These are the factors and levels we want to experiment with:
We will start with a full factorial experiment with 2 central points:
The resulting design is:
We are going to use a simulator in Excel to perform the experiments which you can download: Yield & Cost Simulator
We will just copy columns C5 - C7 from Minitab to the corresponding factor columns in the simulator:
We now copy the results of the experiments (columns Yield and Cost) back to the Minitab sheet in C8 - C9:
Now we start to analyze these results of our factorial experiment with Minitab:
These are the results for Yield:
This is telling us the factorial model is not valid: R-sq(adj) is very low (9,55%). This is the proportion of variability explained by this factorial mathematical model.
If we look at the p-values none of the factors or their interactions are significant.
We suspect curvature in spite of its p-value being slightly above 0,05.
To confirm curvature we can look at the graphs of main effects both for Yield and Cost:
This, clearly, shows curvature both for Yield and Cost because the Center points are totally out of line with the Corner points.
To further confirm curvature we repeat the factorial analysis including central points:
By including the central points in the model R-sq(adj) has improved a lot (97,97%) so this confirms we need quadratic terms in the formula but these can not be calculated with a factorial experiment. We need a Response Surface experiment to model this process.
Response Surface Experiment
We design a Response Surface experiment with the default values for 3 factors
But be aware that we must define our factor values as Axial points to avoid physically impossible values such as a negative pH.
The resulting response surface design is:
To run the experiments we copy columns C5-C7 to our simulator:
And back the results of the experiments (Yield and Cost) to the new Minitab worksheet:
We now perform the analysis of the Response Surface Design:
With the following results for Yield:
R-sq(adj) value of 96,85% confirms the validity of this model. Also all the p-values below 0.05. Only the Temperature*Time interaction and Temperature^2 are not significant. Before we remove these from the model we must make sure they do not significantly affect the other output variable we try to optimize:
Now we confirm that both Temperature^2 and Temperature*Time do not,significantly, affect Yield or Cost.
Therefore we will remove both from the model:
The resulting reduced model for Yield is then:
This is the formula to estimate Yield as a function of Temperature, Time and pH.
The reduced model for Cost is:
We can now see graphically the impact of the 3 factors in Yield with Contour Plots:
And also for Cost:
We can, graphically, see the optimal areas to maximize Yield and to minimize Cost. Unfortunately these areas do not coincide.
Now we will let Minitab calculate the optimal factor values to achieve our overall optimization:
We have decided to assign double weight to Yield Vs Cost. The result is:
This gives us the optimal factor settings to obtain a predicted Yield of 156 and a Cost of 97.
The following interactive graph lets us touch up the factors and see the results of both outputs:
The optimal factor values have been obtained by maximizing the Composite Desirability curves which are based on our condition that Yield is twice as important as Cost.
If we wanted to maximize Yield at any Cost we would obtain:
This improves Yield but it also increases cost.
With this example you can experience the power of Minitab to design and analyze experiments in order to optimize several outputs by calculating the optimal factor settings
The quality level seen by the customer of our process is one of the most critical process metrics. This quality level is the result of all the defects generated by the different process steps as well as those coming from the parts suppliers. In our Value Stream Map we focus on process flow but this flow is very much affected by quality. It is important to find out where in the process each type of defect is generated and where in the process it will be detected to make sure it doesn’t find its way to the customer.
In this model you can download DPUdpmo.xlsx we are simulating a simplified map of an Electronic Assembly and Test Process.
Defect contributors are defective components coming from suppliers and defects from the 4 process steps:
- SMT (Surface Mount Technology) automatic component placement
- Reflow solder of these components
- Manual PTH (Pin Thru Hole) component insertion
- Wave soldering of PTH components.
Defects are detected on:
- Several visual inspections along the process
- ICT (In-Circuit Test) at the end of the assembly process
- Final Functional test.
“Defects per unit” (DPU) is the metric that interests the customer but it’s not very practical as a process metric because we are typically producing products with very different levels of complexity: some boards may have just 5 components and others 500.
“Defects Per Million Opportunities” (DPMO) is a useful process metric that is independent of the product complexity. This metric will allow us, for instance, to know if our soldering process is improving or getting worse with independence of the complexity of the products we are soldering.
In the following example: if we have one defective component in 3 boards the average DPU = 1/3 = 0.33
When we measure in DPMO we must define the OFEs (Opportunities For Error) of the process we are dealing with:
- Component placement process: OFE = Number of components placed
- Soldering process: OFE = Number of solder joints soldered
Therefore if we say our component placement DPMO = 100 this means that out of every million components placed 100 will be missing or misplaced.
If our wave solder DPMO =200 this means that out of every million solder joints 200 will be defective.
In our Electronic Assembly example we have 500 SMT components and 100 PTH therefore these are the corresponding OFEs. Total components used por board are therefore 600.
The 500 SMT components have a total of 1500 solder joints, therefore these are the reflow soder OFEs.
The 100 PTH componntes have a total of 300 solder joints (wave solder OFEs).
Process DPMOs define the quality of our different process steps and the OFEs are specific for each product type.
On each step we can estimate the expected defects per unit:
DPU = DPMO x OFE / 1 000 000
Since the types of defects contributed by the 5 different sources are independent, the total number of defects produced will be sum of each defect contribution. Total DPU is the sum of all DPUs.
Process Yield is the proportion of defect free units. If defects are independent DPU follows a Poisson distribution so Yield and DPU are related by:
Yield = exp(-DPU)
For instance an average of 1 defect per unit will still produce exp(-1) = 0.37 that is 37% defect free units (there will be units with more than one defect)
Total DPU is therefore the sum of DPUs while the total Yield is the product of yields.
Now let’s look at defect detection:
Control points (visual inspections and tests) are defined in the Control Plan for each of the product types. In this example we have several visual inspections, one ICT and one Functional Test performed in this order.
Visual inspection is performed as close as possible to the step generating the defect and the defects detected are typically corrected on the spot. The problem is that it is very inefficient: it can only catch 60% of the defects. In this example this means that out of the total 0.32 DPUs only 0.192 will be caught and corrected, therefore 0.128 will escape.
ICT uses a bed of nails to contact many points in the board and is able to test the functionality of some components. If the test coverage is 80%, 0.102 DPU will be caught in ICT and corrected and therefore 0.026 will escape.
Finally the functional test has a high test coverage but the test is not able to pinpoint where the defect is, therefore faulty boards need to be analysed and repaired by an expert through a lengthy process.
The proportion of boards that will need to be reworked as a result of visual inspection is 17%. This will require additional resources and skills. In the case of ICT and functional test 12% of boards will require rework and this may require a higher level of skills and it will take longer to identify and correct these defects.
At the end of this process we have 0.001 DPU going to the customer which means 1280 faulty boards out of every million.
This is obviously unacceptable so we can start looking at the main contributors to defects and the main escapes of our detection and correction.
We can look at different improvement alternatives with this simulator and estimate the effect on the final ppms going to the customer.
PTH insertion, being a manual operation, is the main contributor to defects. We can look at ways to add some Poka-Yokes (defect proof devices) or look at ways to automate. Suppose we are able the reduce the PTH insertion DPMO from 1500 to 500 the result would be a reduction of output to 1120 ppm.
If we can further improve visual inspection coverage by automation to 80% them it would go down to 560 ppm.
If we redesign the board to eliminate all PTH components and convert them to SMT we would further reduce it to 384 ppm.
In this simulator Process DPMOs come from measuring our current processes. Theoretically they should be Product independent but in actual fact they are not: some designs produce more manufacturing defects than others and the process parameters for some products may not be optimized.
This simulation helps us focus on the main contributors of defects and defect escapes from our tests. In this way we can identify who has to do what in order to improve the process.