This is the ninth episode examining Dan Vacanti’s book Actionable Agile Metrics for Predictability, An Introduction. Previously we discovered the following peculiarities about Service Level Agreements:

  • A delivery commitment should be expressed as date ranges with confidence levels.
  • Improvement can be measured as a decrease in the percentile values, and in the spreads between them.
  • Only few data samples are necessary to determine a reliable confidence level.
  • The closer you are to a stable process, the less data points you need.
  • A Service Level Agreement can be determined only by analyzing your Cycle Time data.
  • You can have different Service Level Agreements for different kinds of work
  • Percentile values can be used for sizing work items.
  • Percentile lines can be used as intervention triggers.
  • Service Level Agreements can and should be used in place of planning and estimation.

Now we will discover Dan Vacanti’s ideas about Classes of Service (CoS) and Pull Policies. This is arguably the most important chapter in his book.

Chapter 13 Pull Policies

Dan starts by describing the practices of the aviation industry, when it comes to security screening and how different type of passengers receive different treatment, and are routed through different queues. The observation is that while certain categories of passenger get preferential treatment, others stay in line waiting. This example serves as an introduction of the idea of a Class of Service.

Class of Service

Dan explains how the idea of Classes of Service typically comes about. In an ideal world there would be as many resources as jobs or types of jobs that need to be done. Unfortunately, resources are limited. Therefore, resources have to handle different kinds of work. Some kinds of work may be — or perceived to be — more (or less) important than others. Thereof comes the tendency to assign resources to work items, with the intent to optimize either the usage of the resource or the value it delivers. It is easy to see where this leads to. Different types of work receive different treatment or service. Pretty much like the business passengers receive a different service than economy class passengers on an airplane.

A Class of Service is often seen as a way to actually identify different types of work. Yet there is a more general idea we can use to define a Class of Service. A Class of Service is a policy that determines the pulling sequence of committed work.

In fact, one can say that any pull policy establishes – de facto – a Class of Service.

There are a number of ways to characterize and categorize Classes of Service. Yet there is a common idea about how any Class of Service should be handled with respect to the process. A Class of Service must always be seen in relation to the point of commitment, and becomes effective only when a work item is pulled into the process. The decision about which Class of Service applies should be done only when a work item is first pulled. Any characterization of any work item by a Class of Service prior to the commitment point is just a mere speculation. Circumstances might change before the effective commitment; and the work item could even be abandoned entirely.

Also one should not confuse the decision criteria about which items to pull off the backlog, with the pulling priorities established by the Classes of Service once items are committed to. The former is a decision about committing; the latter is a policy about priorities.

From a predictability viewpoints, Classes of Service are often a source of problems, especially if the underlying pull policies break the assumptions of Little’s Law. Any pull policy is a source of variability that will impact predictability and increase Flow Debt. Even minor changes to pull policies can have huge impact on Cycle Time distributions.

It is important to notice that such impacts on variability and predictability are due to policies that you yourself decide. The policies are entirely under your control. Policies induce self-inflicted variability! Therefore it is worthwhile to choose them wisely.

Dan explains why one would theoretically prefer a simple First-In-First-Out (FIFO) queuing policy over others. A FIFO queue is the most effective pull policy. The more you deviate from a strictly FIFO policy, the less predictable the process becomes.

Unfortunately FIFO queuing may be impractical. For instance, resource availability might impact what work items can or cannot possibly be pulled into the process at any given moment.

Even small policy changes can have a huge and unforeseeable impact. It becomes a balancing act between the best feasible decision and the best predictable one. If the nature of your process disallows FIFO queuing, you should strive to change the process to support FIFO pulling as much as possible.

Slack

Even if you try to limit the self-induced variability by sticking to a FIFO pull policy inasmuch as possible, you still cannot escape the variability of your situation.

Dan gives the example of how FedEx is able to consistently deliver on time, despite the incredible amount of variability (package classes, loads, frequencies, etc.) that they must handle. FedEx does this by keeping empty planes in the air!

There is a profound lesson in the example of FedEx. The best way to handle variability and yet maintain high predictability is to deliberately build excess capacity — that is, slack — into the process.

Though such a solution — keeping idle excess capacity ready to step in to handle variability — might not be popular with cost-focused accountants, and it might be simply rejected.

An alternative often proposed solutions is to put a WIP limit on any work that is being expedited. That will improve the predictability of expedited items at the detriment of standard items.

While in theory this might be acceptable, in practice it will not work. There will be a strong incentive to expedite all items.

In the worse case, with expedite lanes there will be a tendency to always have one or more such expedite items in process. Expedition will effectively stop all work on standard items.

Any Class of Service will interfere with predictability, especially if they are used to promote expedition of certain items at the detriment of others. This happens even if you have mitigating policies, like setting a very low WIP limit for the expediting work items. Dan puts it bluntly:

If you are going to have an expedited lane, and you limit that lane’s WIP to one, but there is always one item in it, then, I am sorry to say, you do not have an expedited process. You have a standard process that you are calling an expedited process, and you have a substandard process which is everything else.

In general, Dan is extremely critical about introducing Classes of Service without fully understanding Little’s Law. The damage done by breaking the assumptions of Little’s Law far outweighs any speculative business value. Classes of Service are really speculative guesses about business value. Certain type of work is believed to be more valuable than others and hence is allowed to take precedence over other kinds of work.

Even business value considerations and Cost of Delay calculations should not be deemed compelling reasons for changing the ideal FIFO queuing, as they are mostly speculative. The important consideration is that business value cannot be determined a priori; it can only be assessed once the work is delivered to the customer. If that bet is wrong, you will certainly get less business value from the work-item that receives preferential treatment. But it is worse than so: The highest-priority Class of Service will damage every other item put on hold!

That bad guess damages all the Work in Process that has to wait.

Dan’s strategy is very clear: you consider (speculative) business value only at replenishment time, as a selection criteria to decide what to pull. Things should be done as fast as possible, without interferences in their flow through the process. This will reap the effective business value sooner, compensating any apparent delay due to complying to the strict FIFO policy.

Classes of Service are, therefore, considered in a very negative way: Classes of Service are considered as an institutionalized violation of Little’s Law. The whole of Dan’s book holds the thesis that every such violation damages the process’s predictability.

Classes of Service, as realized in most companies, actually introduce rather than reduce variability and unpredictability. Classes of Service are one of the main causes of Flow Debt, because only the highest priority class gains, while all other lose predictability.

Instead of speculative gambling about the business value of certain kinds of work, Dan advises to work out and employ pull policies that support predictability. The only kind of items that can acceptably have a preferential pull policies are those that are true emergencies or are due to some regulatory compliance requirement. If such items happen frequently, then — again — it is a matter of designing the process so that there is sufficient capacity to deal with the needed extra work load.

You need to design your process so that it becomes predictable. A predictable process is one that behaves as you expect, and allows you to make accurate quantitative predictions about the future. Once the process is predictable, chances are you won’t ever need Classes of Service.

Classes of Service and Slack in TameFlow

Most of this chapter is entirely applicable to and completely in line with the TameFlow Approach. One should strive to get to a predictable process, and then try to avoid using Classes of Service. However, if they are eventually employed, then one should preserve the conditions of Little’s Law by consequently setting the pull policies of the Classes of Service, and striving to interfere as little as possible with a plain FIFO queue policy.

The idea of reserving excess capacity is integral to the Theory of Constraints. In fact that is the whole point of Step 2 in the Five Focusing Steps: to subordinate to the constraint. Subordination is only possible if other resources have more capacity than the constraint. It is curious how the idea of allowing for slack might be at odds with the lean principles of removing waste. At times, chasing too much waste reduction may reduce your excess capacity to the point that you are no longer able to handle the natural variability that hits your process. One symptom of this, is the moving constraints syndrome.

It is always a challenge to sell the idea of the necessity of excess capacity to cost focused management. This is one instance where attention to Throughput Accounting can be helpful.

The introduction of expedite treatment is very much despised in TameFlow, but not so much on the grounds of breaking Little’s Law, as Dan (correctly) points out. Expedition will introduce partial interests, especially if there are multiple stake holders, who will fight one another in order to gain preferential treatment. Expedition and preferential treatment is recognized as a source of conflict which undermines TameFlow’s Unity of Purpose. In TameFlow there is a firm resolution to remove all conflicts; so if expedition creates conflicts, expedition should not be used.

Certain types of work cannot avoid expedition — typically those that need to be done with no delay because of emergencies. TameFlow goes to great lengths about how to treat any such extra work, as described in the earlier post Management of Extra Work

Like the previous Chapter, this one is also a great source of wisdom for any TameFlow practitioner. Avoiding Classes of Services, trying to stick to a strick FIFO pull policy, designing the process for predictability, countering variability with excess capacity are all valuable concepts that are all embraced by TameFlow.


Links: