Bill Schmarzo, Day 3 | Strata Data Conference 2012
A fellow InFocus blogger, EMC Consulting's Bill Schmarzo, recently posted about "Finding the Right Metric." He used fielding percentage vs. effective fielding range as a great example of how Big Data can help general managers identify better predictors of a baseball fielder's performance, without encouraging bad player behavior. As a Data Scientist you will often be faced with helping the business make better decisions. But in doing so you must keep in mind your different audiences. Many times the tendency is to take a one-size-fits-all approach to business intelligence. It's a good idea in theory but as we've seen, it can have unintended consequences. Here is a great example from the world of support: Measuring the Service Level Agreement (SLA) can be an effective way to see if your support organization is responding to your customers in a timely manner (within agreed-upon timeframes). Sounds simple, right? Be very careful with this metric though. I have seen plenty of versions over the years, many of which -- in my opinion -- can be misleading. You must always keep in mind, "What are we trying to do?" Here's what I mean: Executive Level At the executive level, it is very straight forward: What percent of cases met SLA? For example, "95 percent" means 95 percent of the cases were responded to within the agreed-upon timeframe. Upper management typically looks at trends to see if things are getting better, worse, or staying steady. These numbers can be cut by create method (e.g. Phone, Web, Chat, Dialhome) and by issue severity to highlight any specific areas that need attention. CHART IS FOR EXAMPLE PURPOSES ONLY Manager Level For a manager-level audience you need to approach this metric quite differently. Clearly managers need more detail. But we also need to be careful how we message this to managers and their engineers. I DO NOT recommend measuring engineers by SLA. This sets the wrong behavior. If an engineer comes online and there are cases sitting in the queue that already missed the SLA, the engineer will never touch them. Why would they, if they are immediately penalized for this case? In the case of baseball players "gaming the system" to improve their fielding percentages, the solution was not to penalize players, but to instead use available data in a new way to find a more effective metric for measuring performance. The same approach applies here. This gets us back to the question, "What are we trying to do?" We want to measure how well we are responding to customers in a timely matter, right? In our executive view, a one-second miss is a miss. But in our manager view, when we do miss, we want to measure the degree of miss. Due to the number of service contracts and severities, the easiest way to roll it up is by using a method of measuring SLA by 1X to 5X: 1X meaning service was within the SLA; 2X means it was over 1X but not more than two times the SLA; and so on. For example, with a 1-hour SLA, anything between 1 hour and 2 hours is a 2X miss. In EMC Customer Support, our messaging is to drive "greater than 5X" down to zero and work your way up from there. Basically, if you miss, don't miss big. Limiting the number of "big misses" drastically reduces the number of escalations -- and by highlighting these big misses, you can expose flaws in the business processes. Misses in the 2X-5X range can then be sliced by create method, severity, time of day, and day of week to isolate problem areas, helping you determine if your business has a small problem or a major one. At least now you can see it and recommend action. CHART IS FOR EXAMPLE PURPOSES ONLY Hopefully I didn't lose you on this one. My point is that one size doesn't fit all, and metrics don't always translate to all levels in a business. The concept usually does, but the way you show it may need to be different. MOST importantly, be careful with how you use this information and the messaging because it's possible you could be causing very bad behavior.