John Brady, wrote a short summary about queue theory and its relevancy to databases. Reading it reminded me of some mistakes I often see when queuing theory is discussed.

Lets start by introducing a concept that John somehow forgot to mention – randomness.

Queue theory is about randomness. The  rate in which requests arrives to your system is random, the amount of time that takes the server to process each request is also random.

When put in terms of randomness, utilization is the probability that at a given point of time the server is busy. Which is a good definition for utilization because we all know that at a point of time, CPU is either busy or not, 0% or 100%. When we talk about utilization we must talk about averages and probabilities.

This is a good place to note that when your average utilization is 50% there is still high probability of extended periods where  the system will be in 100% cpu and will have queues. This is normal result of our planning and cannot really be avoided. You can’t really design a system where the probability for 100% CPU is zero. It doesn’t make sense to do so. Now if only my managers would get that.

So, utilization is the probability that the server is busy, which is identical to the probability that the customer will have to stand in queue. How long will he have to stand? That depends on how many other customers are in the queue and how fast the server can handle them.

There is very little we can say about our queue without knowing something about the rate that work arrives and how fast the server handles them. There are some common assumptions used in queuing theory.

Without assuming those distributions the only thing we know is Little’s Law, which says that the expected number of customers in the system is equal to the rate of arrival multiplied by the average amount of time a customer spends in the system. This law only assumes that the rate of arrivals is not dependent on the queue length (note that even this simple assumption is not always true. Customers may turn away if they notice a queue).

From what I understand (and I may be mistaken here), this law does not translate into queue length and utilization, unless you have additional information about the distribution of arrival rates and service times.

So, we normally assume some standard random distribution, and we talk about average response times, average utilization, average arrival rates (poisson distributed) and average service times (exponentially distributed).

This can work well, but you need to take into account few things:

• There is nothing random or average about the high utilization that occurs at the end of each quarter, just when the users expect better than average response times. “Yes, I know that we agreed for 8 seconds on average, but I’m trying to get the bonuses out on time this month”. You need to account for these events separately.
• The occasional bug that causes a report to take 100% CPU for over 6 hours while everything queues may be random, but is always more frequent than you assume when ordering hardware. Murphy’s law trumps Little’s law every time.
• Almost queuing theory equations assume that when you have multiple servers, they are independent of each other. Service time in one server will not affect service times on others. If you actually plan to use queue theory, TEST THIS ASSUMPTION. You should test all assumptions, but this one is the least likely to be true. Suddenly you discover that your independent storage devices share a switch.
• The best advice about performance predictions is from Cary Millsap – “Measure once – cut twice”. He means that your assumptions are wrong, your estimates which required lots of hard work, are not much better than guesses, and you must build time into the schedule to fix all the sizing mistakes that will come up in the first week the system goes production. As Fredrick Brooks said – “plan to throw one away; you will, anyhow.” He was talking about the first version of a software project, but it seems to apply well to first capacity estimate as well.
• Incidentally, Cary Millsap also gives excellent advice and scripts on how to test that your arrival rates and service rates really match the assumed distributions. You can use his advice (chapter 9 of Oracle Performance book) to make your estimates a bit less of a random guess.
• If you are planning to run 256 concurrent users on 16 CPU system, you cannot test this by running 64 concurrent users on 4 CPU systems.Your system contains a part that doesn’t scale linearly, and the system that performs well in the tiny QA environment will be a disaster in production. This is the kind of catastrophes that you only need to experience once. Since buying another 16 CPU server for QA is out of question, you have two options here.  One is to have 4 completely separate 4-CPU-64-users environments in production. The other is to do load testing in production. It is scary, but it can be done safely if you find someone who knows that he is doing.

1. chris_carr says:

Way back in time I worked in a call centre first as a tech support agent, then as a reporting analyst all this queuing theory reminds me of the calculations used for determining the number of trunks and staff, mainly based on erlangb and erlangc (I even have an some old sqlserver code I wrote to do the calculations), a couple of other things worth noting are:

1: It is assumed that the queue doesn’t consume the resource you’re waiting on, i.e. if you consume cpu time to manage the queue for the cpu you have worse performance than expected.
2: Idle time is not wasted it is the cost of the required performance or service time, you cannot use this to do other work without impacting the response time.
3: The basic Erlang calculations deal with whole servers (CPU’s) virtulisation can really screw this up, those 4 cpus could actually be anywhere between 10% and 100% of a real CPU each, and may change based on demand elsewhere.
4: The stuff I’ve done assumes that a transaction keeps a server busy continuously for the whole transaction and will complete before the next transaction is started, however I’ve seen cases where a transaction can use 30 seconds of cpu time in small chunks in this case two transactions started at the same time could (in the worst case) result in both transactions taking approximately 60 seconds to complete.

All this is making me nostalgic, I think I’ll go rewrite those calculations in PL/sQL 🙂

2. Great article, just in time to show it to some colleagues (or to smash their nose on it).
I’m just missing these things which make our all work more interresting: locks, latches, mutexes etc. and the waits on them. (as they probably do not represent CPU, but everything else)
As long as a process gets a CPU when all required resources are available, it can do its work and release this resource itselve.
But if it’s waiting in the queue, it can not even do its work later, but also release the resources later.
So there is not only the queue on CPUs, but also on all other shared, limited but eligible resources.
Even it’s quite well known to get the runqueue on the CPUs, it’s not that often measured on SCSI-hostqueue or NICs and I have not even a clue how to find informations for memory-controller queue.

3. […] Chen Shapira, the Simple DBA, offered some good theory and practical application on queues. […]

4. […] A couple of posts by Chen Shapira, (this one barbershop-queue-demo and this one random-thoughts-about-queues)  over at prodlife.wordpress.com regarding queuing in systems and queuing theory piqued my […]