System constraints are key when it comes to optimizing our process. The process bottleneck limits the overall throughput and it determines such things as manufacturing lot sizes. 

With this simple manufacturing line simulation you can experience the effects of alternative solutions in order to maximize profit. 

Download Excel file:  TOCeng4.xlsm

Close other Excel files before you open this one and enable Macros.

Process Objective

Run the simulator to obtain the maximum profit after one simulated week. 

You have an initial capital of 1000 € which you can use to buy materials to feed the blue, green and orange machines. 

The green machine performs 3 operations: b, c and d. All parts should be processed through all 3 so you should decide what manufacturing lot size you want and process the lot through each of these operations. Before each operation there is a setup time. In the same way the orange machine has 2 operations: e and f. 

The market will accept any amount of product P with a price of 70 €. Spares P1 and P2 can also be sold but their quantities can never be above the number of products P already sold (if you have already sold 5 P's you can sell, if you want, 5 P1's and 5 P2,s).

Fixed expenses amount to 2000 €/ week and they will be subtracted from the cash balance at the end of each week. 

Week 1 will start with an empty line so you can simulate one single week leaving an empty line at the end.

You can also simulate several weeks, in which case you don't empty the line at the end. 

 Simulator Operation

You operate the simulator with control buttons:

You can either press the start button or use Ctrl + s. The same with the others. The reset button will empty the line and start simulation from zero.

The counter will tell you where you are:

One week is 5 working days of 8 hours. The simulator will stop at the end of each day: just press start to continue.

You start by buying materials based on the lot sizes you have decided and you must select the operation you want to run in the green machine from the pull-down menu: b, c or d.

The same with the orange machine: select e or f.

You can see the details in the Help sheet.

To transfer to the next machine type the amount to be transferred on the yellow boxes. 

Financial control

You can control your financial situation in real time:

You can buy materials as long as you have a positive balance. 

System Constraints

You may want to try your manufacturing strategy with this simulator before you go into a deeper analysis.

These are the constraints of the different machines:

The bottleneck of the whole line is therefore the blue machine (operation a): each product P will need 60 minutes of this machine. 

You will notice that we are only considering the process times (not the setup times). The reason is that the influence of setup times can be eliminated by using large enough lot sizes as we will see later.

Another constraint is the fact that all products P and spares P1 need machine a (the bottleneck). Spares P2, although they don't need machine a to be produced they can't be sold unless products P (which need machine a) have been sold. 

The bottleneck dictates how much we can produce and therefore the profit. We must focus on optimizing the bottleneck time to maximize profit:

To produce a spare P1 we need 60 minutes of bottleneck time and obtain a profit of 30€. If we use that time to produce a product P this allows us to sell also a spare P2 (which doesn't use the bottleneck) and the profit will be 70€.

The conclusion is that we should not produce any spares P1.

Theoretically we should be able to produce and sell 40 P's and 40 P2's per week.

 Lot size

If we decide to produce only P and P2 it will take 60 minutes of bottleneck a to produce one of each. In the green machine we will need to process 2 units: one for P and one for P2; this will take 18 x 2 = 36 minutes of process time.

To process a lot in the green machine we will need 3 setups of 40 min (total 120 min) and 18 x lot size processing time.

During this time bottleneck a must process lot size/ 2 units (only for P). If we dedicate all spare time in the green machine to do setups we conclude that the lot size is 10. 

Indeed, to process these 10 parts we will need 300 minutes which is the time it takes to process 5 parts in the blue machine a.

This means that if we reduce the lot size below 10 the green machine will be more restrictive than blue a so it will become the new bottleneck.

In practice to avoid the green machine from becoming the bottleneck we must leave a margin choosing a lot size greater than 10.

If we apply the same reasoning to the orange machine:

In this case the minimum lot size is 8. It may not be practical to have different lot sizes in both machines so we may decide on a value above both such as 12 to compensate for any inefficiencies.

Possible Results

Theoretically starting with a full line we should be able to produce and sell 40 P,s and 40 P2,s which gives us a weekly margin of 40 x 70 = 2800 €. If we subtract the weekly fixed cost of 2000 € that leaves us a profit of 800 €

Starting with an empty line and leaving it almost empty at the end of one week we obtained the results:

Capacity Utilization

On the top right corner of the simulator we can keep track of the capacity utilization on each of the machines.

At the end of the week we obtained the following results:

The first thing we notice is that the bottleneck a has been producing 100% of the time: one minute lost in the bottleneck would be a loss for the whole line.

The green and the orange machines have been stopped at the end of the week in order to empty the line and also due to inefficiencies on each setup. 

Machines g and h were stopped at the beginning of the  week due to the empty line and also along the week due to their excess capacity. 


System constraints need to be considered when it comes to optimizing a process.

The bottleneck defines the maximum possible throughput for the total process.

The bottleneck defines the rate at which product should be started in the line: starting above the bottleneck capacity will only build up Work In Process and it will not increase throughput

In machines with several operations we want the minimum manufacturing lot size but not so small that it becomes the bottleneck of the total process.

Capacity utilization should be 100% in the bottleneck but not in the rest of the operations.


A Test/ Repair loop can become a bottleneck for the whole Value Stream when the test First Pass Yield is lower than planned.

A drop in test FPY is normally caused by problems upstream in the Value Stream. This Test/ Repair loop is often absent in Value Stream Maps in spite of the potential to become the bottleneck for the total process. 

You can download this example file:  TestRepair.xlsm

This Excel file simulates a Test/ Repair loop such as:

Ideal Situation

The ideal case is when FPY = 100%. In this case no repair is necessary and the required test capacity is just 100, which corresponds to the Value Stream throughput.

You must close all open Excel files before you open this one and you should enable Macros.

To simulate 1 hour operation just press F9. Keep pressing to see the evolution.

To reset (put all Work-In-Process to zero) press Ctrl and r keys.

You can only write in the yellow cells.

You need to work out the minimum required Test and Repair capacities in order to deliver 100 units each hour (balanced loop).

Test FPY Drop

Let's assume 1% of the units are failing test: Test FPY = 99%:

The immediate result is that the output of this loop drops to 99: It has become the bottleneck of the Value Stream.

Since we are not repairing the faulty units they are accumulating in front of the Repair station. 

How much repair capacity do we need in this case?

All repaired items need to be retested so they will add to the new items entering the loop. What test capacity do we need then?

We can calculate this without the need for simulation:

In this case when we test 100 units only 99 come out OK (we have multiplied 100 by FPY which is 0.99). 

If we need an output of 100 items OK how many do we need to test? 

Answer: We divide by the FPY:  100/ 0.99 = 101.1 units per hour

And the repair capacity will be: 101.1 - 100 = 1.1 units per hour

 Repair Time Variation

Test time typically doesn't have much variation, specially if it is automatic. Repair time, on the other hand, tends to have high variability. Some automatic testers may provide specific repair instructions but very often repair requires an investigation which might take a long time. 

Let us assume that in our example an average repair capacity of 1.1 items/ hour has a standard deviation of 0.5:

The frequency distribution of this repair capacity will be: 

And the result will be:

We see that occasionally output will drop and a queue will develop waiting for repair. This means we will need additional average repair capacity to compensate for this variation:

Now the "waiting for repair" queue has moved to "waiting for test" but output is maintained in 100. 


The Test/ Repair loops should normally be included in the Value Stream Map because they are critical steps in the process which could become the bottleneck of the Value Stream. See an example in:

A drop in Test FPY should trigger immediate actions to analyse and correct the source of the defects.

In the mean time we will need additional Test and Repair capacity to maintain the VSM throughput.

The Repair operation may require high skills in short supply so enough training should be provided to insure it doesn't become the bottleneck for the total process. 

By developing a robust Repair process we can reduce the average repair time and its standard deviation reducing this risk. 


Real Time Statistical Process Control enables the operators running the process to know when action is required, as described in

In this second article we will analyse some of the Out-Of-Control situations we can detect and the corresponding corrective actions required.

You can download an Excel file with these examples:


Individuals and Moving Range SPC charts

In sheet A we have collected process yield data every hour during several shifts.

Since we collect a single value each hour we are using the pair:

  • chart for individuals
  • moving range chart

The individuals chart has the actual values and the MR chart the absolute values of the change between two consecutive values.

In both charts we automatically apply the Western Electric rules for Out-Of-Control symptoms and the alert will appear in the corresponding column: N - W for the individuals chart and Y - AH for the MR chart. 

When any of these alerts appear the corresponding point in the chart will be colored: 

  • In red: upward symptom
  • In green: downward symptom

 Process Change

Looking at the charts above, the individuals chart shows green dots from 10:00 to 16:00 and red dots between 1:00 and 3:00 during the night. We can notice a jump from 16:00 to 17:00 which might indicate the moment when the process has changed but this jump is not significant enough to declare it as a process change. 

It is not until 1:00 that we have enough evidence to declare that the process has changed.

In this case, since we are measuring yield, an increase in yield is an improvement. 

So we can say that process yield has improved sometime between 16:00 and 1:00. With this information we should now investigate what happened during this period of time that caused this improvement. 

How much have we improved?  If we compute the average between 6:00 and 16:00 we get an average yield of 93. And from 17:00 to 11:00 the next day 118. So we can estimate an average yield increase of 25.

Process Improvement Symptoms

Let's now analyse the alerts in our sheet A:

The -4 values in the individuals chart indicate 4 out of 5 consecutive values 1 sigma below the center line: values significantly below the average. The value -9 at 14:00 means that the last 9 values were below the center line: same interpretation.

The values 9 starting at 1:00 mean that from 17:00 to 3:00 all values have been above the center line: yield has significantly increased. 

Finally, value 1 in moving range indicates that a significant change has occurred in the yield at 5:00. Looking at the individuals chart we see it corresponds to an increase. 

Slow Trend

Let's interpret the charts in sheet B:

 We notice that there is an upward trend in yield as shown by the green dots up to 6:00 in day 11 followed by red dots starting at 12:00 until the end. 

The alert in the MR chart on 4:00 corresponds to a significant one time decrease in yield so we can interpret it as a false alarm. 

The interpretation of this situation is that we have a continuous process improvement starting around 11:00 on day 11 maybe due to some ongoing improvement actions. 

Let's now look at the alerts:

They confirm a significant yield increase starting around 12:00 of day 11.

Variation Increase

We now analyse sheet C:

An increase in yield variation is visible in the individuals chart but no alarms are visible. 

The MR chart, on the other hand, is meant to detect this variation, and so it does.

Variation is significantly below the center line, as shown by the green dots before 20:00 and a significant increase is shown by the red dots starting at 3:00 on day 11. 

If we only measure averages, as we often do, this problem would remain undetected.

Variation is always a bad thing for our process, so we can say our process has got worse starting around the end of day 10. 

Yield variation will eventually cause inefficiencies, late deliveries and accumulation of WIP so this problem needs to be addressed before these side effects start to appear.

 Day of the Week Effect

Let's now analyse the example in sheet D:

Here we are collecting daily production data and we notice that our average production is 400 and our charts are not detecting any significant changes in the process.

On the other hand we can see a repetitive cycle of 7 days which, of course, corresponds to a week. 

If we look in our calendar the days of lowest production we realize they correspond to Sundays and the next lowest to Saturdays.

This is telling us that maybe there is significant seasonal trend within each week. 

In order to check that we have added column AT to our spreadsheet where we calculate the day of the week corresponding to each date with this formula:


Where A3 holds the date and "2" is to start counting the week on Monday. 

We can now do an ANOVA of our Production Vs Day of the Week with Minitab:

We can now confirm that there are significant differences among the different days of the week: Wednesdays and Fridays we have the highest productions. On Thursdays, for some reason, it is significantly lower and, of course, it is lowest on Saturdays and Sundays.

 Shift Change Effect

We have been suspecting that we have a drop in production at the start of a shift, so we have collected data of hourly production in sheet E to check this effect. These are the resulting charts:


In this case, just like in the previous one, there is no indication of Out-Of-Control but we can see peaks in the MR chart which correspond to the start of each shift.

To see how significant are these differences we add a column with the hour of the shift: 6:00, 14:00 and 22:00 are hour 1 of the shift.

Now we do an Anova in Minitab of Production Vs Shift Time:

We notice that production in the last hour of the shift is significantly higher than the average and in the first hour it is significantly lower. 

We would need to analyse the root cause of this effect which represents an inefficiency in our process. 

One possible cause may be to try to ship as much product as possible at the end of the shift in order to meet the  target. And, to do that, operators are moved to the end of the line. The side effect is that we leave an empty line for the next shift so some operators will have no work until the line fills up.


Real Time SPC is a useful tool for operators in order to control their process by making the required adjustments as soon as needed but without over-reacting.

Control charts are useless if analysis and corrective actions are not done in real time.

Operators need to be trained to do this analysis and to know what corrective action needs to be done and when.

The Standard Work Combination of machine and operator times tries to meet the throughput requirements of the line with the available machines and operators.

You can download this example file:     Standard-Work-Combination-Table.xlsx

In this example 4 autonomous machines are handled by one operator who is performing the following operations on each machine in sequence:

  1. Unload finished part from the machine
  2. Load the new part
  3. Start the machine
  4. Take the removed part to the next machine

The following table contains the operator times to perform these operations as well as the machine times involved:

The manual times include the loading and unloading of the parts in the machine therefore these times are included in the machine times. Walk times involve the operator but not the machines.

Each start time is calculated from the previous start time plus manual and walk times.

The Manual Gantt chart above is built from the start times and manual. You can see that the delay from a manual time and the next is the walk time.

The quality control operation is performed by the operator and no machine is involved. 

The Machine Gantt Chart is built from start times and machine times. 

If we want to maximize the throughput of the line with one single operator the cycle time would be 60. This is the time between two finished parts leaving the line. If we are talking about seconds a cycle of 60 seconds corresponds to a throughput of one unit per minute. 

This cycle was calculated with operator times but did not take into account machine times so we need to insure that no machine times are above the cycle time. 

Lead Time Calculation

How long will it take for one item from start to finish? This is what we call lead time:

You can see it takes 5 cycles for one item to go through the total process so lead time will be 300. 

Increase Throughput

If a cycle of 60 is what the market demands we can handle this line with one single operator. We realize that the machines are kept idle most of the time because it does not make sense to produce more than market demand. 

If we need to increase throughput (decrease cycle) we can add another operator to do this:

If we assign work in this way the cycle would be 25 for operator 1 and 35 for operator 2. This means we can have an overall cycle of 35 and all machine times are below this value. 

The corresponding Gantt charts are:

So with 2 operators we can almost double throughput: cycle is reduced from 60 to 35. 


Standard work combination charts allows planning with machine and operator times in order to achieve the required throughput with the existing equipment and using the minimum human resources. 

Statistical Process Control is used to detect when a significant change has taken place in a process. 

All processes have variation as observed in any of their metrics but not all variation is significant from the statistical point of view. 

When we want to control a process by adjusting some process parameter we can make two mistakes:

  • Over-react by adjusting when we shouldn't
  • Under-react by failing to adjust when we should

Control over-reaction

A case of over-reaction is illustrated by the following example:

Someone is shooting at a target and based on the deviation of the impacts he adjusts the gun site after each shot.

The end result will be an increase of the dispersion of the impacts, therefore the adjustments will make the process worse. 

The correct way is, of course, to fire 5 or 6 shots without adjustments and then decide if adjustment is required based on the center of the impacts.

Control under-reaction

This is an example of under-reaction:

If you drop a live frog in boiling water it will immediately jump out to save its life.

But if you put it in a pot of cold water and heat it the frog will eventually pass out without any reaction.

Many companies have fallen into this trap: 

  • A big disaster generates a quick and effective reaction and the company recovers
  • A slow degradation of their KPIs such as customer sat pass undetected until it is too late 

Statistical Process Control

To avoid these two mistakes we can use SPC to control a metric to find out if the observed changes are SIGNIFICANT from the statistical point of view.

We will illustrate this with the use of an example: control your own weight to check if your diet is leading to your weight target or not.

You can download this example Excel file and replace the data in columns A and B with your own data:


 Data collection

You can decide the frequency of your data collection. In this example I have collected it daily. 

It is important to collect it at the same moment (more or less) each day.

You can either have your Excel file in your smartphone or in the cloud (Google Drive or Microsoft's One Drive)

Results interpretation

I am assuming our target is to reduce weight, therefore a downward trend will put the numbers in green and an upward trend will put them in red.

SPC uses some rules, developed by the Western Electric company, to detect symptoms of SIGNIFICANT trends . They are shown below with a screen of Minitab SPC for individuals:

The center line is the average of all values and the standard  deviation is also estimated by all the values.

In our Excel file upward trend symptoms are in red and downward in green.

The Excel file applies rules 1, 2, 3, 5, and 6 of this list.

Weight data interpretation

On 2/4/2018 weight 78.8 is in red and we see the explanation with a number 2 in column N which means that two values (1/4/2018 and 2/4/2018) with 78.8 Kg are 2 σ above the average. This is passed data. What really interests us is the data we have just collected.

The numbers start getting green on 8/4 with 78 kg and the reason is that out of the 5 last numbers the last 4 are 1 σ below the average. This is interpreted as a SIGNIFICANT downward trend. This trend continues the following days.

 Statistically significant trend doesn't mean that this reduction is acceptable but at least we are moving in the right direction. 

Results analysis with Minitab

In our Excel file there is no graphic: we just want to alert of significant trends 

Looking at this same data in Minitab we can see an SPC chart:

The red dots indicate trends. looking at the graph we notice the first two are upward and the last three downward so the overall trend is downward.

Stable Process

There is a common misconception about the meaning of process stability

For instance some might call the process characterized by the following data unstable:

But if we look at our SPC:

It doesn't give us any alert of unstability

This is a STABLE process consisting simply of throwing dice. Since we have always thrown dice the same way SPC is telling us that this is a stable process: no SIGNIFICANT change has been detected.

Stability is not always a good thing. In the case of our weight control stability would mean that there is no improvement. 

We want stability when the process is OK. If we improve this will show with SPC alerts in the right direction.

 Use for other processes

This Excel file can be used to control a variable other than weight at work. For instance:


In this case we are doing an hourly control of our process yield. In this case Yield increase is good so Red is Good.

At 20:00 we have enough evidence of a Yield increase although the change seems to have taken place at 16:00. 


SPC enables the right detection of process change.

High process intrinsic variation can lead to over-reaction (adjust when you shouldn't) or under-reaction (not reacting when you should)

Part 2: Real-Time SPC Analysis

Real-Time SPC Analysis