The Senile DBA Guide to Troubleshooting Sudden Growth in Redo GenerationPosted: November 4, 2009
I just troubleshooted a server where the amounts of redo generated suddenly exploded to the point of running out of disk space.
After I was done, the problem was found and the storage manager pacified, I decided to save the queries I used. This is a rather common issue, and the scripts will be useful for the next time.
It was very embarrassing to discover that I actually have 4 similar but not identical scripts for troubleshooting redo explosions. Now I have 5 🙂
Here are the techniques I use:
- I determine whether there is really a problem and the times the excessive redo was generated by looking at v$log_history:
select trunc(FIRST_TIME,'HH') ,sum(BLOCKS) BLOCKS , count(*) cnt ,sum(decode(THREAD#,1,BLOCKS,0)) Node1_Blocks ,sum(decode(THREAD#,1,1,0)) Node1_Count ,sum(decode(THREAD#,2,BLOCKS,0)) Node2 ,sum(decode(THREAD#,2,1,0)) Node2_Count ,sum(decode(THREAD#,3,BLOCKS,0)) Node3 ,sum(decode(THREAD#,3,1,0)) Node3_Count from v$archived_log where FIRST_TIME >sysdate -30 group by trunc(FIRST_TIME,'HH') order by trunc(FIRST_TIME,'HH') desc
- If the problem is still happening, I can use Tanel Poder’s Snapper to find the worse redo generating sessions. Tanel explains how to do this in his blog.
- However, Snapper’s output is very limited by the fact that it was written specifically for situations where you cannot create anything on the DB. Since I’m normally the DBA on the servers I troubleshoot, I have another script that actually gets information from v$session, sorts the results, etc.
create global temporary table redo_stats ( runid varchar2(15), sid number, value int ) on commit preserve rows; truncate table redo_stats; insert into redo_stats select 1,sid,value from v$sesstat ss join v$statname sn on ss.statistic#=sn.statistic# where name='redo size' commit; insert into redo_stats select 2,sid,value from v$sesstat ss join v$statname sn on ss.statistic#=sn.statistic# where name='redo size' commit; select * from redo_stats a, redo_stats b,v$session s where a.sid = b.sid and s.sid=a.sid and a.runid = 1 and b.runid = 2 and (b.value-a.value) > 0 order by (b.value-a.value)
- Last technique is normally the most informative, and requires a bit more work than the rest, so I save it for special cases. I’m talking about using logminer to find the offender in the redo logs. This is useful when the problem is no longer happening, but we want to see what was the problem last night. You can do a lot of analysis with the information in logminer, so the key is to dig around and see if you can isolate a problem. I’m just giving a small example here. You can filter log miner data by users, segments, specific transaction ids – the sky is the limit.
-- Here I pick up the relevant redo logs. -- Adjust the where condition to match the times you are interested in. -- I picked the last day. select 'exec sys.dbms_logmnr.add_logfile(''' || name ||''');' from v$archived_log where FIRST_TIME >= sysdate-1 and THREAD#=1 and dest_id=1 order by FIRST_TIME desc -- Use the results of the query above to add the logfiles you are interest in -- Then start the logminer exec sys.dbms_logmnr.add_logfile('/u10/oradata/MYDB01/arch/1_8646_657724194.dbf'); exec sys.dbms_logmnr.start_logmnr(options=>sys.dbms_logmnr.DICT_FROM_ONLINE_CATALOG); -- You can find the top users and segments that generated redo select seg_owner,seg_name,seg_type_name,operation ,min(TIMESTAMP) ,max(TIMESTAMP) ,count(*) from v$logmnr_contents group by seg_owner,seg_name,seg_type_name,operation order by count(*) desc -- You can get more details about the specific actions and the amounts of redo they caused. select LOG_ID,THREAD#,operation, data_obj#,SEG_OWNER,SEG_NAME,TABLE_SPACE,count(*) cnt ,sum(RBABYTE) as RBABYTE ,sum(RBABLK) as RBABLK from v$logmnr_contents group by LOG_ID,THREAD#,operation, data_obj#,SEG_OWNER,SEG_NAME,TABLE_SPACE order by count(*) -- don't forget to close the logminer when you are done exec sys.dbms_logmnr.end_logmnr;
If you have ASH, you can use it to find the sessions and queries that waited the most for “log file sync” event. I found that this has some correlation with the worse redo generators.
-- find the sessions and queries causing most redo and when it happened select SESSION_ID,user_id,sql_id,round(sample_time,'hh'),count(*) from V$ACTIVE_SESSION_HISTORY where event like 'log file sync' group by SESSION_ID,user_id,sql_id,round(sample_time,'hh') order by count(*) desc -- you can look the the SQL itself by: select * from DBA_HIST_SQLTEXT where sql_id='dwbbdanhf7p4a'