Latches, Spinning and Queues

You know that you care a lot about a topic, if you find yourself thinking about it again and again, each time hoping to gain enough insights to stop this cycle.

Turns out, I care a lot about what my CPUs are doing. Last time I came up with the epiphany that 100% CPU utilization is a bad idea. During the discussion that followed, Jonathan Lewis and Noons took the time to explain to me the difference between waiting and spinning.

The topic came up again as I’m digging for the worse ways concurrency can go wrong.

Concurrency becomes interesting when the concurrent processes attempt to access shared resources, and since Oracle has shared memory, the shared resources tend to be areas in the memory.

We are in 11g now, so we have 3 Oracle ways to protect memory structures – Locks, latches and mutexes (Oracle mutexes, which should not be confused with OS mutexes.). Below, I’ll summarize the main differences between them. Nearly everything I wrote (and a lot more including cool examples) is covered by Tom Kyte’s Expert Oracle Database Architecture book. I’m just summarizing the importnat points below for my (and your) convinience.

When you read about latches, the first thing you hear is that “Latches are lightweight locks”. Lightweight in this case means “Takes less bits in memory”. Why do we care about our locking structures being small? Small memory footprint of the locking mechanism will translate to faster checks and changes to it. Latches are smaller than locks, and the new 10g mutexes are even smaller than latches.

Locks work by queueing. One process holds the lock for a resource, everyone else who tries to access the lock queues up and goes to sleep (i.e. off the CPU). When the current process finishes, the next in line becomes runnable and now owns the lock on the resource. Queuing is nice, because we have a bunch of queueing theory that lets us predict response times, and waits and such. It is also nice, because while Oracle manages the locks and queues it gives us tons of information about who is blocking and who is waiting. And as one last nice, while all those processes are waiting for locks, they are not using CPU, nor represent any cpu scheduling overhead.

Latches work by spinning (mostly). Think of a case when you know a process will need the memory structure for a very short amount of time. Do you really want to maintain queues, waste time on context switching, lose your CPU cache all for just few milliseconds of waiting? Latches exist for this reason. If a process tries to access the latch and its busy, it will keep on retrying for a while, still using the cpu. If during this “retry time” the latch became free, the process can take the latch and we are saved from the need to context switch. If it didn’t get the latch after several retries, the process goes off the CPU.

The important thing to note it that there is no queue. So there is a lot of uncertainty around when your process will finally get its latch. It is entirely possible that processes that started spinning on the latch later will get the latch first due to a fluke of luck. Because there is no queue, it seems that there is no good way to find a list of processes that are waiting for a latch (maybe by looking at statistics and wait tables?). You do have good information about how many requests, misses, spins and sleeps per latch, which is very useful information.

It is interesting to see how Oracle attempts to prevent the situation where a process waits forever for a latch, and keeps missing it because newer processes keep snatching the latch away as soon as it is freed. When reading about “latch free” wait events, the documentation says: ” The wait time increases exponentially and does not include spinning on the latch (active waiting). The maximum wait time also depends on the number of latches that the process is holding. There is an incremental wait of up to 2 seconds.” It is nearly the same mechanism ethernet uses to avoid machines starving for network connections (“truncated binary exponential backoff“) . Incrementally increasing the wait times reduces the probability of collision.

Mutexes are best covered by Tanel Poder.  They are even smaller and faster to take than latches, they also work as a cursor pin (signifying shared or exclusive ownership of a cursor), and they give you even less information about who is waiting for what and for how long. You have information about sleep times, but not number of requests and misses.


6 Comments on “Latches, Spinning and Queues”

  1. John Brady says:

    I think you could also do with reading a bit about Queuing Theory. One of the conclusions from this is that you cannot have high utilisation of any resource without some level of queuing occurring. It may not seem obvious, but it is true. Essentially the queue of pending requests or jobs is needed to keep the resource busy most of the time. If the queue was empty then the utilisation of the resource would be very low. High utilisation is only achieved by having a queue of waiting requests, so that there is always a next thing to do when the current thing completes.

    So, yes, 100% CPU utilisation is bad, and is symptomatic of very large queues of waiting processes. Queuing Theory also shows that above 50% utilisation of a resource, the queue of waiting requests is always one on average. Note the ‘on average’ – sometimes it can be empty and sometimes it can have several requests in it – but on average the number of waiting requests will be one.

    A general rule of thumb is to get worried at 80% utilisation, as the queue of waiting requests will average something around four, and rises exponentially above this. Check out the following for some more explanation and a nice graph of queue length versus utilisation. I know it is a Microsoft article, but it does summarise the effects of queuing well, and has the nice graph in it.
    http://technet.microsoft.com/en-gb/library/cc181450.aspx

    John

  2. prodlife says:

    Thanks for the detailed reply.

    This is very helpful input as I’m in the middle of trying to figure out how queue theoretic limit of 80% average utilization should translate to CPU monitoring on our database server.

    For example, 80% average utilization actually mean that you will have 100% utilization for small time frames every once in a while. So, how long can we accept 100% utilization on the server? Obviously 6 hours with 100% and 6 with 0% gives nice 50% utilization, but will not be acceptable to our users. On the other hand, 30 minutes with 100% utilization doesn’t seem (empirically) to create very significant queues.

  3. John Brady says:

    Having said all that about Queuing Theory and averages, you could theoretically also achieve 100% CPU utilisation by only having a few very active processes and no queues. So on a 4 CPU system you could achieve 100% CPU utilisation with only 4 processes, each always busy and executing instructions. In this scenario there is no queuing for the CPUs, as there are only 4 processes, and the efficiency of the system is good.

    However, such a scenario is very, very unlikely. The processes could never block for anything and would always be executing CPU instructions. Which means that they would not be doing any disk I/Os, or communicating with each other by some kind of messages, or using locks or latches on shared structures. Which is the complete opposite of the Oracle Architecture.

    An Oracle database server will have many processes running on it – one shadow server process per connected session (unless you are using the Shared Server configuration) – and they will use shared memory, and locks and latches to control data consistency, and perform lots of disk I/Os.

    So an Oracle database server does conform to the general model in Queuing Theory of having lots of separate clients making requests on the resources in the system. And as a result, it does conform to the golden rule of high resource utilisation equals queues of pending requests.

    These queues can apply to all aspects of a computer system – CPU, Disk, Network and Memory. To drive any one of these at 80% utilisation or above means that you have queues of pending requests, which are needed to keep the utilisation that high.

    The net effect of such high utilisation is an increase in response time. The total elapsed time for a given request is now the wait time in the queue plus the service time of the request itself. When the queue is 4 long on average, then the response time is actually 5 times the service time e.g. you might spend 40 ms waiting to perform a disk I/O of 10 ms itself. So high utilisation and high queues has a direct effect on the elapsed time of individual transactions.

    The formula for the Queue Length (Q) in proportion to Utilisation (U) is:
    Q = U / (1 – U)
    And Response Time is Service Time / (1 – U).

    It depends whether your goal is total throughput irrespective of transaction response time, or your goal is individual transaction response time. In other words is it an online system, or more of a batch system processing relatively large units of work.

    While Queuing Theory can seem theoretical at times, to me it reinforces the message that when you scale up the load on any system there will always be a bottleneck. And that bottleneck will reach high utilisation as it nears its capacity, and queues will form in front of it. Identifying where the queues are in a system, what the bottleneck is, and what can be done to fix it – reduce service time or increase capacity – are key to performance tuning. And an appreciation of Queuing Theory has helped me get a deeper understanding of this area.

    Hope this helps with the explanation of things.

    John

  4. prodlife says:

    John,

    This is a terrific summary of the subject.

    Maybe you should post it in your blog 🙂

  5. Val says:

    John,

    “The formula for the Queue Length (Q) in proportion to Utilisation (U) is:
    Q = U / (1 – U)”

    That’s actually the formula for the total number of jobs in the system, not just in the queue (with some quite strong assumptions too).

  6. John Brady says:

    Val,

    Intuitively I want to disagree with you. See my example of 10 ms disk I/O and 50 ms response time. Intuitively I want to believe that the queue length is 4.

    But I have checked my notes and you are correct. The formula actually gives the total number of requests within the system, both within the queue and currently being serviced. Exactly as you said.

    Thinking about it I can see that the number of requests in the system at any given moment in time can be much larger than 4 in size, and at other times much smaller. With corresponding changes to the queue size too. But the overall average number of requests in the system is 4, from the formula. And the response time can average out to 50 ms, due to the effects of sometimes larger queues.

    And as you say there are many assumptions behind these formulae, such as randomly distributed arrival times. I stand corrected.

    John


Leave a comment