Random Thoughts about QueuesPosted: January 27, 2009
John Brady, wrote a short summary about queue theory and its relevancy to databases. Reading it reminded me of some mistakes I often see when queuing theory is discussed.
Lets start by introducing a concept that John somehow forgot to mention – randomness.
Queue theory is about randomness. The rate in which requests arrives to your system is random, the amount of time that takes the server to process each request is also random.
When put in terms of randomness, utilization is the probability that at a given point of time the server is busy. Which is a good definition for utilization because we all know that at a point of time, CPU is either busy or not, 0% or 100%. When we talk about utilization we must talk about averages and probabilities.
This is a good place to note that when your average utilization is 50% there is still high probability of extended periods where the system will be in 100% cpu and will have queues. This is normal result of our planning and cannot really be avoided. You can’t really design a system where the probability for 100% CPU is zero. It doesn’t make sense to do so. Now if only my managers would get that.
So, utilization is the probability that the server is busy, which is identical to the probability that the customer will have to stand in queue. How long will he have to stand? That depends on how many other customers are in the queue and how fast the server can handle them.
There is very little we can say about our queue without knowing something about the rate that work arrives and how fast the server handles them. There are some common assumptions used in queuing theory.
Without assuming those distributions the only thing we know is Little’s Law, which says that the expected number of customers in the system is equal to the rate of arrival multiplied by the average amount of time a customer spends in the system. This law only assumes that the rate of arrivals is not dependent on the queue length (note that even this simple assumption is not always true. Customers may turn away if they notice a queue).
From what I understand (and I may be mistaken here), this law does not translate into queue length and utilization, unless you have additional information about the distribution of arrival rates and service times.
So, we normally assume some standard random distribution, and we talk about average response times, average utilization, average arrival rates (poisson distributed) and average service times (exponentially distributed).
This can work well, but you need to take into account few things:
- There is nothing random or average about the high utilization that occurs at the end of each quarter, just when the users expect better than average response times. “Yes, I know that we agreed for 8 seconds on average, but I’m trying to get the bonuses out on time this month”. You need to account for these events separately.
- The occasional bug that causes a report to take 100% CPU for over 6 hours while everything queues may be random, but is always more frequent than you assume when ordering hardware. Murphy’s law trumps Little’s law every time.
- Almost queuing theory equations assume that when you have multiple servers, they are independent of each other. Service time in one server will not affect service times on others. If you actually plan to use queue theory, TEST THIS ASSUMPTION. You should test all assumptions, but this one is the least likely to be true. Suddenly you discover that your independent storage devices share a switch.
- The best advice about performance predictions is from Cary Millsap – “Measure once – cut twice”. He means that your assumptions are wrong, your estimates which required lots of hard work, are not much better than guesses, and you must build time into the schedule to fix all the sizing mistakes that will come up in the first week the system goes production. As Fredrick Brooks said – “plan to throw one away; you will, anyhow.” He was talking about the first version of a software project, but it seems to apply well to first capacity estimate as well.
- Incidentally, Cary Millsap also gives excellent advice and scripts on how to test that your arrival rates and service rates really match the assumed distributions. You can use his advice (chapter 9 of Oracle Performance book) to make your estimates a bit less of a random guess.
- If you are planning to run 256 concurrent users on 16 CPU system, you cannot test this by running 64 concurrent users on 4 CPU systems.Your system contains a part that doesn’t scale linearly, and the system that performs well in the tiny QA environment will be a disaster in production. This is the kind of catastrophes that you only need to experience once. Since buying another 16 CPU server for QA is out of question, you have two options here. One is to have 4 completely separate 4-CPU-64-users environments in production. The other is to do load testing in production. It is scary, but it can be done safely if you find someone who knows that he is doing.