Don’t use AWR for wait event analysis

Tonight, at midnight, inserts slowed to a crawl at one of our main production databases. This database is used to process about 300 inserts every second, but for about 20 minutes starting at midnight, each insert took almost a minute.  The problem resolved by itself, but now the application owner wants answers – what caused the slowdown at midnight?

A co-worker ran AWR and found that the top wait event is Enq: TM-Contention. A lock! This looks promising, but inserts usually don’t lock. Thats one of the most important principles of Oracle – readers don’t block writers and writers don’t block readers. The most locking I’d expect from insert will be a latch contention on a hot block. But co-worker is a very experienced DBA and he knew that TM-Contention can happen from non-indexed foreign key, and he started looking for one.
At this point I arrived at the office, and he updated me on the situation.
I don’t like AWR at all, so I took out my favorite tool – Confio. This is a nice graphical tool that helps with wait event analysis, but you can probably get pretty much the same data by digging into v$active_session_history yourself.

Confio also showed that TM-Contention is the largest wait event, but it also showed the queries waiting (lock table … in exclusive mode) and the machines. The table locked and the machines had nothing to do with the insert issue. A completely different part of the application. So, we can skip this event. I don’t think you can get this kind of information from AWR!

The second and third highest events are “db file sequential read” and “gc current request”. These events are so typical of selects (usually with a bad index), that if I saw them in AWR I’d probably assume they are unrelated, but Confio showed, clear as a day, that all the queries with this wait are inserts. I’m not talking about a fancy “insert into … select”, I’m talking about a plain “insert into … values”.

I still don’t know why a plain insert had to wait on “db file sequential read”, and why it had to do it only at midnight and stopped that 20 minutes later. But at least now I know where to look!

Note how AWR not only lead us in the wrong direction, it also doesn’t give enough information to follow up on the issue – like queries, tables and machines involved.


17 Comments on “Don’t use AWR for wait event analysis”

  1. Gary says:

    An insert often has to check to see if a parent is there. It should be a quick primary key access but occasionally you may find it needed to apply a lot of UNDO if the transaction has been long running and the block(s) have been heavily updated.

  2. coskan says:

    in my opinion, you should give at least a link of confio or a screenshot for proving your idea when you are making a big claim like “don’t use awr for wait events”

    Oracle developed awr to analyse wait events. If we don’t use it for wait event analyze, then What should we use it for ?

  3. joel garry says:

    What would be really ironic is if it were bug 5353909

  4. Karl says:

    Hi,
    for detailed wait anlaysis of current production problems I prefer ASH and the good old Waitevent -Views combined with active sessions of V$SESSION.

    AWR Data is compressed and too old to solve current prodcution issues?

    What do you think?

    Karl Reitschuster

  5. Karl says:

    You said
    “I still don’t know why a plain insert had to wait on “db file sequential read”, and why it had to do it only at midnight and stopped that 20 minutes later. But at least now I know where to look!”

    If you insert a row in a table *with* indexes then all new indexed columns have to be updated to the index (put the value and rowid into a leaf block). Update an index means you have first to scan the index -> “db file sequential read”

    Karl

  6. prodlife says:

    Coskan,
    I was unaware that AWR is used specifically for wait events analysis. Especially since it contains much more information than this – hit ratio, some IO statistics, CPU usage, etc. I used AWR to compare load on different systems, but I still think that when investigating performance issues, AWR is simply not informative enough.

    Confio is actually the company name. The product we used is Ignite for Oracle and it can be found here: http://www.confio.com/English/Products/Ignite_for_Oracle.php

    But I did not write this post as free advertisement for Confio. As I wrote in the post, you can get the exact same information from Oracle’s ASH and from many other tools (Quest performance analyst, Symantec I3, Lab128, Kyle Hailey’s free SASH)

  7. prodlife says:

    Karl,

    Confio’s Ignite is basically an expensive graphical front end for ASH. However, ASH alone lacks one important capability – it doesn’t keep the SQL that waited. If the SQL is still in the pool, you are fine, but if it isn’t – you still don’t have enough information. If you want to use ASH for analysis, you probably want to have a job that saves the SQL somewhere.

  8. I was reading this article and wanted to clarify a few points.

    1. Confio’s Ignite for Oracle product does not in any way read ASH data but rather collects it’s own data from the X$ tables. As a result, Ignite supports Oracle 8.0.6 – 11g as well as SQL Server, Sybase, DB2 and J2EE.

    2. Ignite for Oracle costs less than the OEM diagnostics and performance PAKs that you will be required to purchase if you want to query views like v$active_session_history.

    We hear stories similar to the original article all the time. DBAs that have tried our product love it and agree that it is the best product for diagnosing database slowdowns.

    Dean Richards
    Senior DBA, Confio Software
    http://www.confio.com

  9. prodlife says:

    Dean,

    Thanks for the information, and it is nice to know that you support 11g already.

  10. Kyle Hailey says:

    OEM 10g would tell you which queries are waiting on which wait events – it’s pretty easy, plus if you have the the diagnostic pack from Oracle you can run ADDM to get automatic analysis of any bottlenecks in Oracle.
    You can also collect all the same info as ASH on any version of Oracle from 7 to 11 with S-ASH http://ashmasters.com . There is also a free version of ASHMON an OEM-lite software.
    AWR does collect top sql and it’s available even after it’s aged out of shared pool.

  11. Johan says:

    > Ignite for Oracle … collects it’s own data from the X$ tables. As
    > a result, Ignite supports Oracle 8.0.6 – 11g as well as SQL
    > Server, Sybase, DB2 and J2EE.

    Ignite reads the very Oracle-specific X$ tables, and as a result it supports SQL Server/Sybase/DB2? Sounds a bit weird to me.

    J2EE is something totally different….

  12. prodlife says:

    Johan,

    Maybe I didn’t explain myself well.
    SQL Server, Sybase, DB2 and J2EE are different versions of ignite, which don’t use X$ views… I just mentioned that the possibility exist.

  13. dev_orcl says:

    It’s the main point is : in production system, application that hits oracle database is running continously. except in testing environment you can use your own script to Analyze the wait event. CMIIW

  14. Sarit Hod says:

    In Fact, AWR has tons of information that allows to troubeshoot effectively, espacially when it is used in conjuntion with ASH data (that it collects as well)

    We use Performance Explorer-i (from Enteros – see http://www.enteros.com ) and it enabled to identify very compex performance issues based on it’s analysis of AWR data.

    • prodlife says:

      Hi Sarit,

      AWR and ASH collect a lot of important data, the problem is that Oracle’s build in reports are not useful for debugging certain types of issues.
      Which is exactly why we need third party tools such as Performance Explorer-i, they show the same data but in a visual and correlated way that can be used for diagnosis.

  15. oPSiTo says:

    If you know what to ask to the awr and ash views, you receive your answer. You must prepare your queries to the awr like a visual tool but without graphics 🙂

  16. ashis says:

    You are right AWR does not give much infromation apart from the running sql queries.


Leave a comment