I'm just a simple DBA on a complex production system

Writing about all things production. Especially Oracle databases.

Modifying RAC Parameters February 24, 2009

Filed under: rac — prodlife @ 4:52 am

Few commands that proved useful today and I’ll probably want to refer to in the future:

  1. Change the voting disk timeout to 3 minutes: crsctl set css misscount 180
    This is not really recommended due to increase risk of split brain issues.
  2. Changing VIP timeouts is a longer story…
    1. As oracle:
      srvctl stop instance -d -i
      srvctl stop nodeapps -n
    2. As root:
      crs_stat -p ora.<hostname>.vip > /tmp/ora.<hostname>.vip.cap
      crs_profile -update ora.<hostname>.vip -dir /tmp -o ci=120,st=120
      (ci is check_interval, st is script_timeout, both in seconds)
      crs_register ora.<hostname>.vip -dir /tmp -u
    3. Verify:
      crs_stat -p ora.<hostname>.vip | grep CHECK_INTERVAL
      crs_stat -p ora.<hostname>.vip | grep SCRIPT_TIMEOUT
    4. As oracle again:
      srvctl start nodeapps -n <hostname>
      srvctl start instance -d <dbname> -i <instname>
  3. When the machine is moved to a new segment:
    oifcfg delif -global eth0/192.168.56.0
    oifcfg setif -global eth0/192.168.50.0:public
  4. When you have a new VIP:
    srvctl modify nodeapps -n hostname -A new_vip/255.255.255.0/eth0

I’m not one to insist on a friendly interface and certainly not on GUI, but I could appreciate a bit more consistency.

 

Is it boring yet? February 17, 2009

Filed under: musing — prodlife @ 3:11 am

unbored

Two years ago I’ve read an interesting post by Paul Vallee. He was explaining the business case behind Pythian and other IT outsourcing services, and made the point that in-house DBAs automate themselves out of work and into boredom.

It made lots of sense when I’ve read it, and since I’m a good DBA, and I try to automate everything, I’m waiting for my job to become boring. Somehow it does not quite happen.

The very scientific graph above explains part of the reason – In the last year our business nearly doubled (it was an amazing year!), while the DBA headcount was reduced by 30%. No one was fired. Actually two excellent DBAs were promoted. But they are no longer DBAs and there will be no replacements.

There are other things that keep me from getting bored.

  • We are doing load tests now (I could swear it was QAs job last year). Load tests are non-boring by definition – after you have the right tools and scripts there is still tons of non-automatable brain work involved.
  • We are planning to switch to new storage systems (maybe). This means that everything changes – from backups to DRP.
  • We keep evaluating new tools. Confio or Grid Control? UC4 or Opsware Orchestrate or Patchlink?
  • There are more and more exceptions to our standard procedures. In this economy it is risky saying “we don’t do this”, so we have to make our tools more and more flexible.
  • Oracle Streams. We had it automated from day one. But it keeps breaking and we keep discovering new bugs/limitations. This product is complex and buggy enough to keep me un-bored for years.
  • Integrations. Until this year we had seperate databases for seperate applications and we liked it that way. Now everyone wants their peopleware to talk to their project management tool, their monitors to their bug control, and of course everything should be integrated with the CMDB. Finding good solutions for this integrations and supporting them is a challenge.
  • Process improvements. We are an ITIL shop (and proud of it!), and big part of ITIL is the process improvement process. Release management processes are the big target this year.
  • Simple things sometimes fail. Just last month I’ve seen 2 DBAs, with over 30 years of experience between them, take over a week to install 10.2.0.2 RAC.  A bug, of course. But bugs still exist.

So, I did not automate myself into boredom. Maybe next year. What about everyone else? Are you bored yet?

P.S. – the graph above is part of the latest trend of visualization. If you want to be a cutting edge and trendy DBA, read Tanel Poder and Alex Gorbachev and learn how.

 

HotSos, Concurrency, Papers and Related Thoughts February 16, 2009

Filed under: concurrency,musing — prodlife @ 8:24 am

So I finally finished my “Seven Sins of Concurrency” paper. The one I’m going to present at HotSos Symposium next month.

Writing a paper was a very educating experience. In the few weekends I’ve been working on it, I learned more about Oracle than I did during the last year. Its been tiring and exhilarating. Of course I learned all sorts of new things about concurrency, consistency, locks and latches, but I also learned about merge, autonomous transactions and packages.

I used to think that writing a paper is a bit like writing a technical blog post. There are some similarities, but the big difference is that blogging is fun, while writing a paper goes beyond fun into painful. Kind of like riding my bike. I do it for fun, but when I train for a race, it is pure pain. Of course, this pain is what leads to improvements. Still, I would need a long time to recover from this.

Writing a paper and publishing it is also very scary. The paper contains some things that were not done before, otherwise it will not be worth while. But since they were never done before, there is some chance that I got things completely wrong. Since I published all my scripts and data, if I’m wrong, someone can find out how wrong I am. While I love corrections and always want to improve, the idea of being wrong this publicly scares me a lot. I’m putting a lot of myself out there for the world to criticize. Of course, going this far out of one’s comfort zone is also critical for improvement. But I can understand presenters who give those lists of tips and best practices – it seems less scary.

While thinking about the paper, and reading some other papers for inspiration, I noticed three interesting things that separate the papers I see in the Oracle world and the papers I see from the academia:

  1. Academic papers are usually written by more than one person, which is quite rare for Oracle papers.
  2. Academic papers always reference the latest and greatest work of their peers. Oracle papers, when they reference at all, usually reference older and more established work. Academics always present their work as improvement on something their peers did, while in the Oracle world we try to present our papers as original, even when it is not.
  3. Academic papers are published in peer reviewed journals. Oracle papers are rarely published and rarely peer reviewed.

I think that point #3 is the cause of the previous points. Since we don’t have peer reviewed journals with their associated prestige, all the prestige in Oracle world belongs to conferences and presentations. We write papers for conferences, and since we plan to be the one presenting on stage, we don’t collaborate. Of course, since there are no journals, it is more difficult to reference and improve on the most recent work done in our field. Just keeping updated is challenging.

It looks like a peer reviewed journal could greatly improve the quality of scientific Oracle papers out there. Of course, peer reviewed journal would be expensive and may not be justifiable from business perspective. Maybe we should start with a peer reviewed blog. I even have a title – “Oak Table Chronicles” :)

 

DDL Implicit Commit February 11, 2009

Filed under: concurrency — prodlife @ 6:49 am

We all know that DDL implicitly commits transactions. However, I was not aware that it commits both BEFORE and AFTER the DDL.

I ran the following code from session 1:

SQL> create table t1 (x number);

Table created.

SQL> insert into t1 values (1);

1 row created.

SQL> commit;

Commit complete.

SQL> create or replace procedure p1 as
2 begin
3 update t1 set x=2;
4 commit;
5 end;
6 /

Then in session 2:

SQL> update t1 set x=3;

1 row updated.

Now back in session 1:
SQL> exec p1

Session 1 promptly hangs, due to session 2 locking the row in t1.

Now in session 2, I run:

SQL> drop procedure p1;

I expected session 2 to hang as well, which makes it a deadlock.

However, what I saw was:
Procedure dropped.
in session 2 and
PL/SQL procedure successfully completed.
in session 1.

Which could only happen if the DDL drop procedure commited session 2, which caused session 1 to finish running the procedure, which enabled the drop to work.

Coffman et al (1971) showed that four conditions are necessary for a deadlock. The third condition is – no preemption – resources granted cannot be taken away from a process. By causing DDLs to force a commit before they run, Oracle forced preemption and avoided deadlocks.

I like that. Sometimes I hate the fact that DDLs force a commit, but now I see the benefits.
I’ve heard that in Postgres DDLs are transactional and can be rolled back. I wonder if my scenario would deadlock in Postgres.

 

Barbershop Queue Demo February 9, 2009

Filed under: concurrency — prodlife @ 6:37 am

Yet another classic concurrency problem.

This time, we have a barbershop. The shop employees one or more barbers. Customers come into the shop. If one of the barbers is free, he will give the customer a haircut. If all barbers are busy, the customer will wait in line until one of the barbers can take care of him. Barbers will handle customers on “first come first serve” basis. Of course, we don’t want to have two barbers working on the same customer at the same time, or giving haircuts to customers who no longer need them. We also don’t want barbers blocking each other, or blocking the entrance to the shop while working.

Its a nice problem, because you can check different queuing scenarios – fixed number of barbers while the number of customers grows, or how many barbers are needed to operate a shop with about 10 customers every second.

You can see my code here.

Normally, you would start one process running add_customers, with parameters telling it how many customers on average enter the shop every second, and how long should the demo run.
Then you add a process for each barber, with parameters for how long it takes to give a haircut, and how long the demo will run.

Something like:

(echo exec barber_shop.add_customers\(3,60\) | sqlplus -s scott/tiger)&
(echo exec barber_shop.add_barber\(0.3,60\) | sqlplus -s scott/tiger)&
(echo exec barber_shop.add_barber\(0.3,60\) | sqlplus -s scott/tiger)&
(echo exec barber_shop.add_barber\(0.3,60\) | sqlplus -s scott/tiger)&

Comments are welcome :)

Note: On rare occasions, the “add_customers” process waited over a minute between updating the customers and committing. “add_barber” did not run at all during that minute.
I suspect either my OS getting stuck somewhere or Oracle’s resource manager. However, when it happened I did not check v$sesstat or ASH to see what happened during that minute. Now it doesn’t reproduce, and I don’t have the session_id that had the issue to investigate.

If anyone tries running my code and runs into the same bug, please collect helpful events and timed statistics that may help debug it. I’ll buy a beer to anyone providing information that helps catch the bug (unless they are non-drinkers, in which case they can get orange juice ;) ).

 

Psychology of Instrumentation February 4, 2009

Filed under: musing — prodlife @ 7:01 pm

Cary Millsap reread Knuth and found a fascinating quote:

“I’ve become convinced that all compilers written from now on should be designed to provide all programmers with feedback indicating what parts of their programs are costing the most; indeed, this feedback should be supplied automatically unless it has been specifically turned off.”

Knuth wrote this in 1974. Its been 35 years. Compilers (and runtime environments such as JVM) still do not provide this feedback automatically. I believe that the need for this did not diminish.

Since compilers do not provide performance feedback, there is a very large market for instrumentation vendors. The sell you a bit of code that you can integrate in your programs, or install on your J2EE application servers or on your database, and it will give you exact information about where your program is spending its time.

I’ve worked for an instrumentation vendor, and I also talked to many of them as part of my production DBA role. Whenever an instrumentation vendor talks to a prospective customer, the first question is always: “What is the overhead?”. Not “How it can help me?”, “How much I can expect to improve my performance?” or “Is it easy to use?”. The answer to the first question is always the same, by the way, 5-10%.

The reason that this is always the first question is that people in general are always more concerned about losing something currently they have than about gaining something new. This tendency is called “endowment effect”, and it is quite powerful. Salespeople will tell you that supply is limited when you hesitate about buying something, because they know that fear of losing (even if it is just losing the option of buying this) is a more powerful motivator than any attractive features of the product.

In our case, developers and DBAs are more worried about losing something they have (Performance benefits of running non-instrumented code) than they are interested in the benefits of knowing where their product is spending time.

Suppose that all compilers and runtime environments had instrumentation enabled by default, with the option to turn it off. The same endowment effect would now work in our favor! No one would turn this off, because they’ll be more concerned about losing information they currently have (and became used to having) than they’ll care about the performance benefits of turning this instrumentation off.

The pervasive lack of instrumentation in software products is more a result of psychological bias than real technical concerns. Software vendors can work around these psychological issues by building instrumentation as a default into tools involved in the development and deployment process. Just as Knuth said 35 years ago.

In the mean while, I hope that by being aware of this irrational psychological bias, developers, DBAs and production application owners can overcome it and make sure they have the data they need to monitor and improve their applications.

 

 
Follow

Get every new post delivered to your Inbox.

Join 48 other followers