The Senile DBA Guide to Troubleshooting Sudden Growth in Redo Generation

I just troubleshooted a server where the amounts of redo generated suddenly exploded to the point of running out of disk space.

After I was done, the problem was found and the storage manager pacified, I decided to save the queries I used. This is a rather common issue, and the scripts will be useful for the next time.

It was very embarrassing to discover that I actually have 4 similar but not identical scripts for troubleshooting redo explosions. Now I have 5 🙂

Here are the techniques I use:

  1. I determine whether there is really a problem and the times the excessive redo was generated by looking at v$log_history:
    select trunc(FIRST_TIME,'HH') ,sum(BLOCKS) BLOCKS , count(*) cnt
    ,sum(decode(THREAD#,1,BLOCKS,0)) Node1_Blocks
    ,sum(decode(THREAD#,1,1,0)) Node1_Count
    ,sum(decode(THREAD#,2,BLOCKS,0)) Node2
    ,sum(decode(THREAD#,2,1,0)) Node2_Count
    ,sum(decode(THREAD#,3,BLOCKS,0)) Node3
    ,sum(decode(THREAD#,3,1,0)) Node3_Count
    from v$archived_log
    where FIRST_TIME >sysdate -30
    group by trunc(FIRST_TIME,'HH')
    order by trunc(FIRST_TIME,'HH') desc
  2. If the problem is still happening, I can use Tanel Poder’s Snapper to find the worse redo generating sessions. Tanel explains how to do this in his blog.
  3. However, Snapper’s output is very limited by the fact that it was written specifically for situations where you cannot create anything on the DB. Since I’m normally the DBA on the servers I troubleshoot, I have another script that actually gets information from v$session, sorts the results, etc.
    create global temporary table redo_stats
    ( runid varchar2(15),
      sid number,
      value int )
    on commit preserve rows;
    truncate table redo_stats;
    insert into redo_stats select 1,sid,value from v$sesstat ss
    join v$statname sn on ss.statistic#=sn.statistic#
    where name='redo size'
    insert into redo_stats select 2,sid,value from v$sesstat ss
    join v$statname sn on ss.statistic#=sn.statistic#
    where name='redo size'
    select *
            from redo_stats a, redo_stats b,v$session s
           where a.sid = b.sid
           and s.sid=a.sid
             and a.runid = 1
             and b.runid = 2
             and (b.value-a.value) > 0
           order by (b.value-a.value)
  4. Last technique is normally the most informative, and requires a bit more work than the rest, so I save it for special cases. I’m talking about using logminer to find the offender in the redo logs. This is useful when the problem is no longer happening, but we want to see what was the problem last night. You can do a lot of analysis with the information in logminer, so the key is to dig around and see if you can isolate a problem. I’m just giving a small example here. You can filter log miner data by users, segments, specific transaction ids – the sky is the limit.
    -- Here I pick up the relevant redo logs.
    -- Adjust the where condition to match the times you are interested in.
    -- I picked the last day.
    select 'exec sys.dbms_logmnr.add_logfile(''' || name ||''');'
    from v$archived_log
    where FIRST_TIME >= sysdate-1
    and THREAD#=1
    and dest_id=1
    order by FIRST_TIME desc
    -- Use the results of the query above to add the logfiles you are interest in
    -- Then start the logminer
    exec sys.dbms_logmnr.add_logfile('/u10/oradata/MYDB01/arch/1_8646_657724194.dbf');
    exec sys.dbms_logmnr.start_logmnr(options=>sys.dbms_logmnr.DICT_FROM_ONLINE_CATALOG);
    -- You can find the top users and segments that generated redo
    select seg_owner,seg_name,seg_type_name,operation ,min(TIMESTAMP) ,max(TIMESTAMP) ,count(*)
    from v$logmnr_contents 
    group by seg_owner,seg_name,seg_type_name,operation
    order by count(*) desc
    -- You can get more details about the specific actions and the amounts of redo they caused.
    select LOG_ID,THREAD#,operation, data_obj#,SEG_OWNER,SEG_NAME,TABLE_SPACE,count(*) cnt ,sum(RBABYTE) as RBABYTE ,sum(RBABLK) as RBABLK 
    from v$logmnr_contents 
    group by LOG_ID,THREAD#,operation, data_obj#,SEG_OWNER,SEG_NAME,TABLE_SPACE
    order by count(*)
    -- don't forget to close the logminer when you are done
    exec sys.dbms_logmnr.end_logmnr;
  5. If you have ASH, you can use it to find the sessions and queries that waited the most for “log file sync” event. I found that this has some correlation with the worse redo generators.

    -- find the sessions and queries causing most redo and when it happened
    select SESSION_ID,user_id,sql_id,round(sample_time,'hh'),count(*) from V$ACTIVE_SESSION_HISTORY
    where event like 'log file sync'
    group by  SESSION_ID,user_id,sql_id,round(sample_time,'hh')
    order by count(*) desc
    -- you can look the the SQL itself by:
    select * from DBA_HIST_SQLTEXT
    where sql_id='dwbbdanhf7p4a'

13 Comments on “The Senile DBA Guide to Troubleshooting Sudden Growth in Redo Generation”

  1. CJ says:

    Good article.

    I normally create a database logoff trigger that reads v$sesstat and inserts data into a table, and leave it running for a day or so. That usually reveals which session/schema/app is the worst offender.

    • prodlife says:

      For reasons I don’t 100% agree with, we avoid creating triggers in our environment at all. Your method sounds useful – as long as the offending session logged off.

      • CJ says:

        To give credit where due, I learnt this trick from the OWI book by Kirtikumar Deshpande et al. In fact they recommend it as a ‘performance data collector’ tool. Since we don’t have ASH licenses, I find this extremely useful. Lots of useful information can be derived from recording v$sesstat and v$session_event.

  2. Marko Sutic says:

    Nice trick using logminer to diagnose redo growth.

    We had similar situation with disk running out of space cause of redo generation. If I knew then this trick with logminer troubleshooting would be much simpler.
    Great article!


  3. Yas says:

    Chen, nice queries. But I do not agree with 5. A session generating the most redo may not be the one mostly waiting on log file sync. If the session runs a massive DML statement without a commit it never waits for the log file sync event. Right?

  4. Freek says:

    Also, you need to be carefull when there are many sessions, each doing a little bit of redo instead of 1 big session.
    We had this problem once, when an application went crazy and restarted a certain action again and again (each time creating a session to the db, do some work and logging off again).

    I had to use logminer to go through the actions to find out what was happening.

  5. […] Shapira offers The Senile DBA Guide to Troubleshooting Sudden Growth in Redo Generation, which begins, “I just troubleshooted a server where the amounts of redo generated suddenly […]

  6. […] Chen Shapira-The Senile DBA Guide To Troubleshooting Sudden Growth In Redo Generation […]

  7. redo generation says:

    Do you have a query where you can find how to find which node(Rac) is generating the most redo log?

  8. Don Seiler says:

    Just had to use this for a client. Thanks!

  9. Ganesh says:

    For the below query,

    In our database the sql_id is null for few rows… Would you explain me what would be cause? please; thanks.

    select SESSION_ID,user_id,sql_id,round(sample_time,’hh’),count(*) from V$ACTIVE_SESSION_HISTORY
    where event like ‘log file sync’
    group by SESSION_ID,user_id,sql_id,round(sample_time,’hh’)
    order by count(*) desc

    SESSION_ID user_id sql_id round(sample_time,’hh’) count(*)
    429 1202 16/08/2011 12:00:00 PM 560
    345 22 16/08/2011 11:00:00 AM 4
    330 2784 16/08/2011 11:00:00 AM 3

    • prodlife says:

      blank sql_id usually means the sql is part of a scheduled job. Its a well known issue in the data reported in v$session.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s