A Test/ Repair loop can become a bottleneck for the whole Value Stream when the test First Pass Yield is lower than planned.

A drop in test FPY is normally caused by problems upstream in the Value Stream. This Test/ Repair loop is often absent in Value Stream Maps in spite of the potential to become the bottleneck for the total process. 

You can download this example file:  TestRepair.xlsm

This Excel file simulates a Test/ Repair loop such as:

Ideal Situation

The ideal case is when FPY = 100%. In this case no repair is necessary and the required test capacity is just 100, which corresponds to the Value Stream throughput.

You must close all open Excel files before you open this one and you should enable Macros.

To simulate 1 hour operation just press F9. Press the RUN button to run 100 cycles.

To reset (put all Work-In-Process to zero and set default values) press Ctrl + r 

You can only write in the yellow cells.

This Test-Repair loop is part of a Value Stream with a Throughput of 100 units/ hour (Takt = 36 seconds) therefore you need to work out the minimum required Test and Repair capacities in order to deliver 100 units each hour.

Test FPY Drop

Let's assume 1% of the units are failing test: Test FPY = 99%:

The immediate result is that the output of this loop drops to 99: It has become the bottleneck of the Value Stream.

Since we are not repairing the faulty units they are accumulating in front of the Repair station. 

How much repair capacity do we need in this case?

All repaired items need to be retested so they will add to the new items entering the loop. What test capacity do we need then?

We can calculate this without the need for simulation:

In this case when we test 100 units only 99 come out OK (we have multiplied 100 by FPY which is 0.99). 

If we need an output of 100 items OK how many do we need to test? 

Answer: We divide by the FPY:  100/ 0.99 = 101.1 units per hour

And the repair capacity will be: 101.1 - 100 = 1.1 units per hour

 Repair Time Variation

Test time typically doesn't have much variation, specially if it is automatic. Repair time, on the other hand, tends to have high variability. Some automatic testers may provide specific repair instructions but very often repair requires an investigation which might take a long time. 

Let us assume that in our example an average repair capacity of 1.1 items/ hour has a standard deviation of 0.5:

The frequency distribution of this repair capacity will be: 

And the result will be:

We see that occasionally output will drop and a queue will develop waiting for repair. This means we will need additional average repair capacity to compensate for this variation:

Now the "waiting for repair" queue has moved to "waiting for test" but output is maintained in 100. 

Yield Drop Effects

Imagine that test FPY drops to 98% due to a problem upstream, maybe a supplier:

We will need additional test capacity to test again the repaired units and also additional repair capacity. 

The standard deviation of the test capacity will typically increase also and this will cause an accumulation of WIP both before test and before repair. This increase of WIP will increase the overall lead time of the Value Stream. 

Not Repairable Units

We have assumed that all units may, eventually, be repaired but this is not always the case.

In case you have yield loss you can see:

https://polyhedrika.com/2-uncategorised/12-process-simulation-2

Conclusion

The Test/ Repair loops should normally be included in the Value Stream Map because they are critical steps in the process which could become the bottleneck of the Value Stream. See an example in:

https://polyhedrika.com/2-uncategorised/14-value-stream-map-with-excel

A drop in Test FPY should trigger immediate actions to analyse and correct the source of the defects.

In the mean time we will need additional Test and Repair capacity to maintain the VSM throughput.

The Repair operation may require high skills in short supply so enough training should be provided to insure it doesn't become the bottleneck for the total process. 

By developing a robust Repair process we can reduce the average repair time and its standard deviation reducing this risk.