Welcome to the eighth installment about Dan Vacanti’s book Actionable Agile Metrics for Predictability, An Introduction. In the previous post on Scatterplots we discovered the following highlights:

  • A Scatterplot is not a Control Chart.
  • Cycle Time data is not normally distributed.
  • Percentile Lines can be drawn independently of the data’s distribution.
  • Scatterplots give a temporal view and can uncover trends over time.
  • Histograms give an idea about the shape of the data’s distribution.
  • The shapes of Scatterplots are really a reflection of organizational policies.
  • Some of the most common shapes are: Triangle, Clusters and Gaps.
  • You cannot identify special/common causes simply by looking at a Scatterplot.
  • It is really important to figure out if any variability is self-imposed rather than being out of control.
  • It pays to identify Internal and External Variability.

In this episode we will see what Dan Vacanti has to say about Service Level Agreements (SLA).

Chapter 12 Service Level Agreements

A team should be comfortable taking on commitments. It is the team that makes the commitment that should be empowered to choose the commitment points. Expectations should be set in collaboration with the client. Missed commitments should be used as learning opportunities, and not as reasons to punish teams.

A delivery commitment should be expressed as date range with confidence levels. When establishing time targets, the percentile lines in a Scatterplot can be used as guidance.

The objective is not only to meet the target percentile at the corresponding percentage of times, but also to decrease the time period that corresponds to that percentile.

Improvement can be measured as a decrease in the percentile values, and in the spreads between them. Decreasing percentile values means faster delivery. Spreads mean variability, and reducing variability means improving predictability. Naturally the choice of percentile lines affects this. You can start with the standard percentiles, and later change them to other percentiles that better reflect your process.

It is often believed that any statistical approach needs a huge sample of data points in order to establish any significance. Only few data samples are necessary to determine a reliable confidence level. A number between a dozen and 30 samples is usually sufficient.

Naturally, predictability relies not only on the number of samples, but also on how well the assumptions of Little’s Law are warranted. In fact, the quality of the process has more impact on overall predictability, than the number of samples you consider. The closer you are to a stable process, the less data points you need.

A Service Level Agreement can be determined only by analyzing your Cycle Time data. It cannot be dictated by wishful thinking managers. It must be collaboratively agreed upon with the customers or stakeholders.

Little’s Law is valid even when segmenting Work in Progress (WIP). So you might be tempted to start categorizing your work in many different kinds to do complex analytics.

However, Dan advises to start with just one type. It is only at later stages, when the process is stable and predictable, that it might make sense to elaborate further types. One reason to do so is naturally to track items that experienced irregular flow, like abandonment or skipping of stages.

Another reason is to be able to identify the different percentiles for Service Level Agreements for different kinds of work. Of course, the assumptions of Little’s Law must be upheld for each segment. This has an important consequence: You can have different Service Level Agreements for different kinds of work.

Percentile values can be used for sizing work items. If any work item appears to exceed the worse percentile reference value, then it should be broken up into smaller items.

The percentile lines can be used to monitor aging work items. Percentile lines can be used as intervention triggers. Effectively they are checkpoints where you may ponder about the state of the work items literally cross the line.

Specifically, you want to avoid work-items passing the commitment percentile, and hopefully act pro-actively and decisively before they get there (because beyond that point, predictability will be doomed entirely!). It is easy to do: you just monitor a work-item’s aging against it’s Service Level Agreement percentile.

Service Level Agreements can and should be used in place of planning and estimation. Percentile lines can quickly give you date ranges and confidence levels to which you can commit, without the need of speculative estimation and upfront planning.

Service Level Agreements and TameFlow

Most of this chapter is entirely applicable to and completely in line with the TameFlow approach. It becomes particularly inspiring when we consider the use of Minimum Marketable Releases combined with Buffer Management. The reason is that Cycle Time distributions and percentile lines are an excellent way to both size and position the MMR Buffer. When using MMRs with buffers, then Buffer Management is the premier tool you use to monitor execution, and thus the level of service you are committed to.

When you start with TameFlow and use MMRs and buffers, the so called Cut and Paste Method (C&PM) for sizing and positioning the buffer is the recommended heuristic to use. It means you set the buffer size to half the mean duration of your Cycle Time data, and position the buffer starting at the mean. This is actually a “quick and dirty” rule-of-thumb. It works great to get started. Though, in most cases it results in buffers that are too large, and thus too little responsive (“lazy”) in their signals.

There are a number of buffer sizing methods in the literature of the Theory of Constraints, and some are just too heavy to be practical except for the most critical projects.

However, if you do collect Cycle Time data (as you should) and know their distribution, then reliable buffer sizing and positioning can be achieved very easily: Simply pick a starting and an ending percentile line to represent your MMR buffer.

For instance, a reasonably good buffer could be between the 60th and the 85th percentile lines. A very aggressive buffer could be one between the 70th and the 85th percentile lines. The advantage of using buffers is that you can size them dynamically, even in due course of execution. If you find that your buffer is lazy, make it more aggressive (i.e. smaller) and/or position it sooner. If it gives off too many signals, make it less aggressive and/or position it later. After a while, you will be able to find the percentile lines that work best for your kind of process.

Since the Cycle Time data is most reliable the more you warrant the conditions of Little’s Law, the implication is that when using the TameFlow Approach, you should really strive to preserve those conditions. They are the keys to get a predictable process, and that in turn is the key to get a buffer that is correctly sized and placed, according to the percentile lines.

Dan’s work definitively re-enforces what we are already doing with TameFlow, which confirms how important his book is from a TameFlow perspective. A real must-read!