One More Thing Everyone Should Know About Queues

I already posted two things everyone should know about queues, but the incidents of the last month made me realize I missed another very important queue fact.

Don’t double the load on a server and expect that response times will be “a little slower”. If the server is above 30-40% capacity, doubling the load is likely to be catastrophic.

Suppose you have two servers, for example a Netapp heads, that are operating at 48% capacity each, and suddenly one of them fails and you have to move all the load to one of them. This means that you have one server at close to 100% capacity. Users will not complain that things are a bit slow, they will complain that nothing is working.

Someone once told me that if I have two RAC nodes running at 100% CPU each, I do not have high availability. The truth is that you stop having high availability long before the 200% CPU point.

Oh, and in case anyone wonders what we did about the overloaded Netapp. We stopped every non-essential process reading from this Netapp. This included snapmirrors, non-production databases, exports, vmwares, and probably more stuff than I know. This moved utilization down to 60% point and life was good (except that we weren’t too comfortable about lack of snapmirrors and exports).

Advertisements

HotSos, Concurrency, Papers and Related Thoughts

So I finally finished my “Seven Sins of Concurrency” paper. The one I’m going to present at HotSos Symposium next month.

Writing a paper was a very educating experience. In the few weekends I’ve been working on it, I learned more about Oracle than I did during the last year. Its been tiring and exhilarating. Of course I learned all sorts of new things about concurrency, consistency, locks and latches, but I also learned about merge, autonomous transactions and packages.

I used to think that writing a paper is a bit like writing a technical blog post. There are some similarities, but the big difference is that blogging is fun, while writing a paper goes beyond fun into painful. Kind of like riding my bike. I do it for fun, but when I train for a race, it is pure pain. Of course, this pain is what leads to improvements. Still, I would need a long time to recover from this.

Writing a paper and publishing it is also very scary. The paper contains some things that were not done before, otherwise it will not be worth while. But since they were never done before, there is some chance that I got things completely wrong. Since I published all my scripts and data, if I’m wrong, someone can find out how wrong I am. While I love corrections and always want to improve, the idea of being wrong this publicly scares me a lot. I’m putting a lot of myself out there for the world to criticize. Of course, going this far out of one’s comfort zone is also critical for improvement. But I can understand presenters who give those lists of tips and best practices – it seems less scary.

While thinking about the paper, and reading some other papers for inspiration, I noticed three interesting things that separate the papers I see in the Oracle world and the papers I see from the academia:

  1. Academic papers are usually written by more than one person, which is quite rare for Oracle papers.
  2. Academic papers always reference the latest and greatest work of their peers. Oracle papers, when they reference at all, usually reference older and more established work. Academics always present their work as improvement on something their peers did, while in the Oracle world we try to present our papers as original, even when it is not.
  3. Academic papers are published in peer reviewed journals. Oracle papers are rarely published and rarely peer reviewed.

I think that point #3 is the cause of the previous points. Since we don’t have peer reviewed journals with their associated prestige, all the prestige in Oracle world belongs to conferences and presentations. We write papers for conferences, and since we plan to be the one presenting on stage, we don’t collaborate. Of course, since there are no journals, it is more difficult to reference and improve on the most recent work done in our field. Just keeping updated is challenging.

It looks like a peer reviewed journal could greatly improve the quality of scientific Oracle papers out there. Of course, peer reviewed journal would be expensive and may not be justifiable from business perspective. Maybe we should start with a peer reviewed blog. I even have a title – “Oak Table Chronicles” 🙂


DDL Implicit Commit

We all know that DDL implicitly commits transactions. However, I was not aware that it commits both BEFORE and AFTER the DDL.

I ran the following code from session 1:

SQL> create table t1 (x number);

Table created.

SQL> insert into t1 values (1);

1 row created.

SQL> commit;

Commit complete.

SQL> create or replace procedure p1 as
2 begin
3 update t1 set x=2;
4 commit;
5 end;
6 /

Then in session 2:

SQL> update t1 set x=3;

1 row updated.

Now back in session 1:
SQL> exec p1

Session 1 promptly hangs, due to session 2 locking the row in t1.

Now in session 2, I run:

SQL> drop procedure p1;

I expected session 2 to hang as well, which makes it a deadlock.

However, what I saw was:
Procedure dropped.
in session 2 and
PL/SQL procedure successfully completed.
in session 1.

Which could only happen if the DDL drop procedure commited session 2, which caused session 1 to finish running the procedure, which enabled the drop to work.

Coffman et al (1971) showed that four conditions are necessary for a deadlock. The third condition is – no preemption – resources granted cannot be taken away from a process. By causing DDLs to force a commit before they run, Oracle forced preemption and avoided deadlocks.

I like that. Sometimes I hate the fact that DDLs force a commit, but now I see the benefits.
I’ve heard that in Postgres DDLs are transactional and can be rolled back. I wonder if my scenario would deadlock in Postgres.


Barbershop Queue Demo

Yet another classic concurrency problem.

This time, we have a barbershop. The shop employees one or more barbers. Customers come into the shop. If one of the barbers is free, he will give the customer a haircut. If all barbers are busy, the customer will wait in line until one of the barbers can take care of him. Barbers will handle customers on “first come first serve” basis. Of course, we don’t want to have two barbers working on the same customer at the same time, or giving haircuts to customers who no longer need them. We also don’t want barbers blocking each other, or blocking the entrance to the shop while working.

Its a nice problem, because you can check different queuing scenarios – fixed number of barbers while the number of customers grows, or how many barbers are needed to operate a shop with about 10 customers every second.

You can see my code here.

Normally, you would start one process running add_customers, with parameters telling it how many customers on average enter the shop every second, and how long should the demo run.
Then you add a process for each barber, with parameters for how long it takes to give a haircut, and how long the demo will run.

Something like:

(echo exec barber_shop.add_customers\(3,60\) | sqlplus -s scott/tiger)&
(echo exec barber_shop.add_barber\(0.3,60\) | sqlplus -s scott/tiger)&
(echo exec barber_shop.add_barber\(0.3,60\) | sqlplus -s scott/tiger)&
(echo exec barber_shop.add_barber\(0.3,60\) | sqlplus -s scott/tiger)&

Comments are welcome 🙂

Note: On rare occasions, the “add_customers” process waited over a minute between updating the customers and committing. “add_barber” did not run at all during that minute.
I suspect either my OS getting stuck somewhere or Oracle’s resource manager. However, when it happened I did not check v$sesstat or ASH to see what happened during that minute. Now it doesn’t reproduce, and I don’t have the session_id that had the issue to investigate.

If anyone tries running my code and runs into the same bug, please collect helpful events and timed statistics that may help debug it. I’ll buy a beer to anyone providing information that helps catch the bug (unless they are non-drinkers, in which case they can get orange juice 😉 ).


Dining Philosophers

The Dining Philosophers problem is probably the most famous of the classic synchronization problems. Posed and solved by Edsger Dijkstra in 1965, it became famous not because it is practical, but because it demonstrate in a nicely visual way some of the dangers inherent in careless sharing of resources.

Five philosophers are sitting around a round (oak) table. Each of the five philosophers has a bowl of rice, and between every two bowls there is a single chopstick. As illustrated in the following diagram:

dp

Philosophers never talk to each other. They spend their time thinking. When a philosopher becomes hungry, she attempts to pick up the two chopsticks closest to her (the ones that are shared between her and her left and right neighbors). Each philosopher can only pick up one chopstick at a time. Then she eats, without putting down the chopsticks. When she is done, she puts both chopsticks down and resumes thinking.

We must find a method that will allow the five philosophers to share five chopsticks and eat without letting any philosopher starve from prolonged lack of access to chopsticks.

As I wrote above, this is a very famous and non-practical concurrency problem. Surprisingly, no one demonstrated it in PL/SQL on Oracle database yet.

So, here are three different “solutions” to the dining philosopher’s problem for Oracle.

The code you’ll find in the link contains several sections:

  1. Setup code to create the tables with the philosophers and their chopsticks.
  2. set_status procedure that allows the philosopher to announce their status (eating, hungry) to us, so we can monitor their progress. This is an autonomous transaction, because we don’t want the philosophers to drop their chopsticks while announcing that they are eating.
  3. Three different eating functions, representing the three typical solutions to shared resource problems.
  4. add_philosopher code that will allow us to add philosophers to the table and watch them compete for chopsticks.

I also have a nice shell script that runs the five philosophers we need to make the problem interesting.

Now, some explanation about the three eating procedures.

  1. The first one is the most trivial one. A philosopher grabs the right chopstick, then the left one and then eat. If any stick is busy, we wait. Note that for some time efficiency (and CPU killing) the philosophers eat very very fast in this version. Run this version and very soon, all philosophers will simultaneously pick up their right sticks and wait for the left to become free. They could wait a long time, but Oracle will soon choose one of them as the deadlock victim and allow the other four to proceed eating in peace.
  2. An attempt of avoiding deadlocks – Each philosopher picks up her right stick, and if she sees the left one is taken, she drops her right stick and tries all over again. This will ensure no deadlocks (since waiting philosophers will not hold their sticks), but we are likely to encounter a situation where a philosopher keeps picking up and dropping his right stick without ever getting to eat. It is unlikely that this will continue forever, but it may continue enough to make our philosopher very unhappy with her response times. Since each philosopher should be able to eat on average 400ms every second, if a philosopher did not eat for an entire second we will consider her dead from starvation. If you run this example, you should see at least one philosopher dead from starvation.
  3. At last a good solution! We avoid starvation by not dropping the stick we are already holding, and avoid deadlocking by switching the order in which we pick up the sticks. One of our philosophers will pick his left stick first. This solution is also called “partial hierarchy” – we make sure the blocking graph will be a tree and not a cycle. Run this example and see that you encounter no deadlocks and no starving philosophers.

Readers who made it this far – this example will be a part of my HotSos presentation (demonstrating two concurrency sins – deadlocks and starvation). Any comments are appreciated. If you ran my code and had problems, or worse, failed to run into dead philosophers, let me know. If you think my code plain sucks, let me know. If you didn’t understand what I’m trying to say or why I’m trying to say it, also let me know.


Latches, Spinning and Queues

You know that you care a lot about a topic, if you find yourself thinking about it again and again, each time hoping to gain enough insights to stop this cycle.

Turns out, I care a lot about what my CPUs are doing. Last time I came up with the epiphany that 100% CPU utilization is a bad idea. During the discussion that followed, Jonathan Lewis and Noons took the time to explain to me the difference between waiting and spinning.

The topic came up again as I’m digging for the worse ways concurrency can go wrong.

Concurrency becomes interesting when the concurrent processes attempt to access shared resources, and since Oracle has shared memory, the shared resources tend to be areas in the memory.

We are in 11g now, so we have 3 Oracle ways to protect memory structures – Locks, latches and mutexes (Oracle mutexes, which should not be confused with OS mutexes.). Below, I’ll summarize the main differences between them. Nearly everything I wrote (and a lot more including cool examples) is covered by Tom Kyte’s Expert Oracle Database Architecture book. I’m just summarizing the importnat points below for my (and your) convinience.

When you read about latches, the first thing you hear is that “Latches are lightweight locks”. Lightweight in this case means “Takes less bits in memory”. Why do we care about our locking structures being small? Small memory footprint of the locking mechanism will translate to faster checks and changes to it. Latches are smaller than locks, and the new 10g mutexes are even smaller than latches.

Locks work by queueing. One process holds the lock for a resource, everyone else who tries to access the lock queues up and goes to sleep (i.e. off the CPU). When the current process finishes, the next in line becomes runnable and now owns the lock on the resource. Queuing is nice, because we have a bunch of queueing theory that lets us predict response times, and waits and such. It is also nice, because while Oracle manages the locks and queues it gives us tons of information about who is blocking and who is waiting. And as one last nice, while all those processes are waiting for locks, they are not using CPU, nor represent any cpu scheduling overhead.

Latches work by spinning (mostly). Think of a case when you know a process will need the memory structure for a very short amount of time. Do you really want to maintain queues, waste time on context switching, lose your CPU cache all for just few milliseconds of waiting? Latches exist for this reason. If a process tries to access the latch and its busy, it will keep on retrying for a while, still using the cpu. If during this “retry time” the latch became free, the process can take the latch and we are saved from the need to context switch. If it didn’t get the latch after several retries, the process goes off the CPU.

The important thing to note it that there is no queue. So there is a lot of uncertainty around when your process will finally get its latch. It is entirely possible that processes that started spinning on the latch later will get the latch first due to a fluke of luck. Because there is no queue, it seems that there is no good way to find a list of processes that are waiting for a latch (maybe by looking at statistics and wait tables?). You do have good information about how many requests, misses, spins and sleeps per latch, which is very useful information.

It is interesting to see how Oracle attempts to prevent the situation where a process waits forever for a latch, and keeps missing it because newer processes keep snatching the latch away as soon as it is freed. When reading about “latch free” wait events, the documentation says: ” The wait time increases exponentially and does not include spinning on the latch (active waiting). The maximum wait time also depends on the number of latches that the process is holding. There is an incremental wait of up to 2 seconds.” It is nearly the same mechanism ethernet uses to avoid machines starving for network connections (“truncated binary exponential backoff“) . Incrementally increasing the wait times reduces the probability of collision.

Mutexes are best covered by Tanel Poder.  They are even smaller and faster to take than latches, they also work as a cursor pin (signifying shared or exclusive ownership of a cursor), and they give you even less information about who is waiting for what and for how long. You have information about sleep times, but not number of requests and misses.


Concurrency at Hotsos

Everyone says that the best way to go to a conference is to be a speaker. For the last 2 years I’ve been trying to go to Hotsos symposium, but I never got the time and budget for this.

So when Hotsos published a call for papers for the 2009 symposium, sending in an abstract or two with my ideas seemed to make sense. Nothing to lose, right?

I did not expect to have my abstract accepted. And now my name is up there in the speakers list, between Cary Millsap  and Chris Date. Somehow, I don’t feel like I fit in. Its been few month since I heard the news and I can still barely believe it.

Of course, now management had no choice, and I’ll get to go to HotSos and listen to terrific technical sessions from very smart people. Yay!

My  session is going to be about concurrency errors. Its a HUGE topic and it was discussed a lot in the past, so I’m working hard to find a unique and interesting angle on this, and to avoid reiterating topics that were discussed to death. My unique take on concurrency is taking classical concurrency problems from OS research, translating them to Oracle and show how they can be used to solve common issues in DB development. There will also be a fair bit of statistics and web servers thrown in because thats what I know, do and love.

I’m planning to talk a lot about testing because concurrency problems are notoriously difficult to test for. I’ll mention some statistical techniques to find concurrency problems, because this is something I didn’t see mentioned before, and I’m very happy whenever I can use my statistics education in real life.

I want to discuss lots of OS theory because so much of it applies directly to Oracle and I want everyone to benefit from the research that was done in a related field. This means talking about queuing and also about process management overheads.

I want to talk about the problems that many application servers inadvertly cause on the database side – such as allowing a user to click refresh on a large report again and again. I see these all the time.

I’ll also talk about starvation a bit, simply because I didn’t hear it discussed yet. And there is a very special case of deadlock that I’d love to talk about, if I could just get a good test case for it.

Thats quite a lot of stuff I want to talk about, and I’ll probably have to make some painfull cuts. This is after I already had to painfully cut a bunch of stuff that I decided not to talk about.

For example, deadlocks are the most well known and well researched concurrency mistake, so I’ll not spend lot of time on it. Why waste time when I can just point to Mark bobak’s presentation from HotSoS few years back (http://www.oaktable.net/userFiles.jsp)?

I’ll also have to skip talking about concurrency problems on RAC. Its a fascinating topic, but its a huge one on its own. Maybe next year? I’ll also have to skip talking about undo+redo overheads caused by many concurrent updates, and latch contentions, and hot blocks… these are all fascinating and relevant aspects that I just could not fit in.

I hope I’ll manage to pull all of these ideas into a good presentation. I hope lots of people will show up and enjoy it. I’m sure HotSoS will be an amazing conference. Its in two month, but I’m already very excited about it.