Okay, so capacity planning will never be fool proof. You simply cannot predict the future. However, some of the time you have a darn good idea of what the future will hold. Since someone knows what is likely to happen, why is it so hard to plan marketing initiatives, funnels and IT provisioning?
The reason is that things aren’t always linearly correlated. What’s that mean? Linear correlation goes something like this: if A depends upon B and I want twice as much A, I’ll need twice as much B. While correlating non-linear systems can be tricky, a lot can be done with linear regressions. The problem with any regression is that you need to put real numbers in, get real numbers out and understand how good they are.
When we look at how something grows, one of the most common tools in the statistics arsenal is a least-squared linear regression. That is: given a set of datapoints, what line best fits them? So, let’s say we have a lot of datapoints (boy do we have a lot of datapoints!). Now what does a linear regression tell us?
Let’s assume we’re looking at some traffic data over the month of December.
In this graph, it can be very hard to answer questions about the nature of the data. Two common questions are:
- are we growing or shrinking and by how much?
- if we stay on the current growth path, where will we be some point in the future?
Enter the linear regression:
Answering the first question is pretty simple now. We can look at the value on the left side of the graph, and the right side of the graph and do the math. You can’t see it in the screenshot, but the left the values are 5.49M and 5.88M which is roughly a 6.6% growth over 4 weeks. Now, any statistician will scream bloody murder about confidences in the data and model and any engineer will simply ask: “does that make sense?” Maybe we’ll look over 8 weeks and twelve weeks also to make sure that we build our confidence (this can be easier, though far less scientific, than understanding R2 values – which are, of course, available as well). Honestly, I personally find that reconciling this with my expectations is one of the better methods of trusting the model.
Let’s assume that we we expected some increase in resource usage during this time frame and that 6% is reasonable. Now onto the next question: where will we be in the future. In Circonus, we just jump up and extend our view window out one year and we can see what our model looks like in the future:
Next December we’ll be using 10.91M (this just happens to be MBits/s of network bandwidth to serve origin dynamic content on one of the sites managed over at OmniTI). We’ll revisit this month by month to ensure that we are indeed heading where we expected. It allows engineers and marketers and executives alike put real numbers into (what we call) napkin math which adds peace, clarity and allows most people to do easier what-if pontification. I can tell you one thing… we sleep better at night knowing specific numbers about a probable future.