Autocorrelation and CausationPosted: December 7, 2009
Everyone talks about how correlation doesn’t imply causation, but no one says what autocorrelation implies.
Maybe its because we don’t talk about autocorrelation at all 🙂
Lets start talking about autocorrelation by saying what it is:
First of all, autocorrelation is a concept related to time series. Time series is an ordered series of measurements taken at intervals over time. You know how we measure CPU utilization every 10 minutes and then display nice “CPU over time” graphs? thats a time series.
Autocorrelation is the correlation between points in the time series. So we can compare every point in the time series to the measurement taken 10 minutes later, 20 minutes later, etc. And we can find out that every point in our graph is strongly correlated with the measurement taken 10 minutes later and the point taken 60 minutes later.
But does that imply causation?
We normally don’t assume that the current value of the CPU caused the value that the CPU has in 60 minutes. It makes much more sense to assume that there is a third factor that causes the CPU to peak every 60 minutes. This effect is also called seasonality. The weather today is strongly correlated with the weather on Dec 9th 2008. The third factor in this case is the circles our planet makes around the sun.
However the autocorrelation with the point immediately following the current value, often does imply causation of sorts. If you want to make a good guess about the value of IBM stock tomorrow, your best bet is to guess that it will be the same as the value today. Stock values usually have very strong short-term autocorrelation, and we can say that tomorrow’s value is todays value plus some error. IBM stock prices are normally stable, so the error is normally small. So you can say that today’s stock value is caused by today’s value. In a similar way the CPU in 2 minutes can be predicted to be identical to the CPU right now.
I’m hesitant to call this “causation”, because although the stock price today does cause the stock price of tomorrow (plus an error!), the “real” cause is that stock prices and cpus behave in a specific way. On the other hand, we know that they behave in a specific way because we measured the autocorrelation, modeled it and made predictions that work. So in two important uses of causation, understanding the behavior of the thing we measured and making predictions, we can say that we have a cause-and-effect relation. Albeit a bit less intuitive that usual.
If you dig the idea of explaining and predicting CPU and other important performance measurements by using only the measure itself without looking for other explaining factors, then you should definitely attend my presentation about time series analysis at RMOUG. I’ll show exactly how we find autocorrelations and how to predict future values and we’ll discuss whether or not this is a useful method.