Anecdote about Rebuilding Indexes

A year ago I was handling a rather confusing performance issue. It was the usual problem – a query that took forever to run. While checking the issue, I discovered that one of the main tables that this query was querying had amazing size variance at different times it held anything between 3 rows to 3 million rows. This is obviously a design mistake that can indeed cause performance problems. An experienced DBA might have configured a job to collect statistics very frequently, or maybe something about bind variable peeking, or just go to the developers and demand an immediate fix.

I’m not an experienced DBA. I noticed that the indexes were highly fragmented with lots of deleted leaf nodes, and so I decided to rebuild the indexes. Online rebuild, so users will not be affected and parallel – so it will go faster.

It was a disaster. First of all, rebuilding online does affect performance. A lot. Users did notice. The table was effectively locked until we killed the rebuild. Its been a year and Oracle support is still silent about whether this is an expected behavior of online rebuild. Also, since the index was rebuilt with parallel option, many of the queries started running in parallel, which also had horrible effects and we had to fix this issue too.

After we cleaned up the entire mess, I got a stern lecture from Senior DBA: “You can’t solve performance problems by randomly rebuilding indexes. You must throughly understand the problem first”.

Fast forward to this weekend. Customer complains about very slow queries. Quick check on v$sql shows that each execution retrieves about 100 rows, but does about 10 million buffer gets. Ouch. The explain plan looked reasonable and I couldn’t see any cause for the excessive gets. I decide to wait with this to Monday, run a trace and see what is really happening there. I never got to run that trace. Senior DBA got home later that day, saw the emails about that query and immediately decides to rebuild indexes!

Needless to say, rebuilding indexes works great when you really understand the problem. It took 2 hours, caused no additional performance issues, and after it was done, the query required only 1000 buffer gets.

I still have some suspicions that the rebuild didn’t change much and what really caused the difference was the fact that Senior DBA also collected statistics on the entire table and all its indexes immediately after the rebuild, but we can’t really know, and as far as I saw – the plan did change.

Regardless of the exact facts, this reminded me of an AI Koan:

A novice was trying to fix a broken Lisp machine by turning the power off and on. Knight, seeing what the student was doing spoke sternly: “You can not fix a machine by just power-cycling it with no understanding of what is going wrong.” Knight turned the machine off and on. The machine worked.

(Knight is the inventor of the Lisp machine).

You can read more AI Koans here. Highly recommended.

Advertisements

10 Comments on “Anecdote about Rebuilding Indexes”

  1. > Its been a year and Oracle support is still
    > silent about whether this is an expected
    > behavior of online rebuild
    out of the doc :
    1) http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/statements_1009.htm#isthref4041
    Parallel DML is not supported during online index building

    2) http://download.oracle.com/docs/cd/B19306_01/server.102/b14231/indexes.htm#sthref2541
    While you can perform DML operations during an online index build, Oracle recommends that you do not perform major/large DML operations during this procedure. This is because while the DML on the base table is taking place it holds a lock on that resource. The DDL to build the index cannot proceed until the transaction acting on the base table commits or rolls back, thus releasing the lock.

    3)
    Compare with 11g enhancement
    http://download.oracle.com/docs/cd/B28359_01/server.111/b28279/chapter1.htm#FEATURENO07174
    Online Index Creation and Rebuild Enhancements
    In highly concurrent environments, the requirement of acquiring a DML-blocking lock at the beginning and end of an online index creation and rebuild could lead to spikes of waiting DML operations and, therefore, a short drop and spike of system usage. While this is not an overall problem for the database, this anomaly in system usage could trigger operating system alarm levels. This feature eliminates the need for DML-blocking locks when creating or rebuilding an online index.

    Online index creation and rebuild prior to this release required a DML-blocking lock at the beginning and end of the rebuild for a short period of time. This meant that there would be two points at which DML activity came to a halt. This DML-blocking lock is no longer required, making these online index operations fully transparent.

    > I’m not an experienced DBA
    Yes you are 😉

  2. Freek says:

    One way to fix the problem with the number of table rows would be to not collect statistics, but instead trust on the optimizer_dynamic_sampling to gather statistics on the fly (using a limited part of the table).

  3. Joe says:

    I’ve bitten myself trying to gather space back from a production refreshed development environmnet by rebuilding indexes and then shrinking the datafiles. My main issue was that my server was getting CRUSHED by parallel slaves. The main reason is because I rebuilt my indexes “online parallel.” You think that it rebuilds them online in parallel right? Wrong! It actually sets the degree on the index, which will then allow the optimizer to choose parallel query!

    See Note: 196938.1 in metalink
    and for a fantastic example http://blog.tanelpoder.com/2007/06/23/a-gotcha-with-parallel-index-builds-parallel-degree-and-query-plans/

  4. prodlife says:

    Laurent,

    I wish the docs were that clear about online and parallel rebuild, but here’s what I’ve read:

    “Parallel DML is not supported during online index building. If you specify ONLINE and subsequently issue parallel DML statements, then Oracle Database returns an error.”

    Which appears to mean that if someone else is doing parallel DML during the rebuild, they will get an error.

    My rebuild command didn’t give any errors, the DML operations users tried to perform at the same time didn’t give any error either, they just hang.

    I don’t think it was the expected behavior, and maybe after a year Oracle should fix the docs or patch the issue.

    We will soon begin testing the application in 11g. I’ll see if this reproduces.

  5. prodlife says:

    Joe,

    Yes, exactly what happened to us 🙂
    It is very common trap.

  6. prodlife says:

    Freek,

    Thanks, thats an interesting option. I’ll definitely look into it.

  7. Virag Sharma says:

    The problem you faced , might faced by most of the DBAs. Only thing is that, you wrote your experience honestly and in simple word on blog.

    This is an expected behavior of online rebuild ,”laurentschneider” already answered most of the question/doubt. If you check alert.log , you will find some error/warning messages when online index rebuild about to finish and transaction were going on and if you search those error you will find those are due to lock.

    Oracle 11g new features – Online index rebuild with NO pause to DML , That mean early version have pause to DML on online index rebuild 🙂

    >collected statistics on the entire table
    >and all its indexes immediately
    We need to collect stats for entire table and indexes after indexes rebuild

  8. moshez says:

    As soon as I read your story, I was thinking about this AI koan. Woo 🙂

  9. Andy C says:

    Interesting real-life story.

    I don’t think such volatile tables are that rare. For example, Siebel queues aynchronous requests in tables.

    If the consumer process is stuck, broken, misconfigured or just slow then a huge backlog can accumulate which is cleared down when the issue is fixed.

    PS. Subscribed 🙂

  10. prodlife says:

    Hi Andy C,

    Well, I can’t say I’d recommend managing queues this way, but it is good to know that lessons learned here may be handy in the future.

    PS. Welcome 🙂


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s