I just troubleshooted a server where the amounts of redo generated suddenly exploded to the point of running out of disk space.
After I was done, the problem was found and the storage manager pacified, I decided to save the queries I used. This is a rather common issue, and the scripts will be useful for the next time.
It was very embarrassing to discover that I actually have 4 similar but not identical scripts for troubleshooting redo explosions. Now I have 5
Here are the techniques I use:
- I determine whether there is really a problem and the times the excessive redo was generated by looking at v$log_history:
select trunc(FIRST_TIME,'HH') ,sum(BLOCKS) BLOCKS , count(*) cnt ,sum(decode(THREAD#,1,BLOCKS,0)) Node1_Blocks ,sum(decode(THREAD#,1,1,0)) Node1_Count ,sum(decode(THREAD#,2,BLOCKS,0)) Node2 ,sum(decode(THREAD#,2,1,0)) Node2_Count ,sum(decode(THREAD#,3,BLOCKS,0)) Node3 ,sum(decode(THREAD#,3,1,0)) Node3_Count from v$archived_log where FIRST_TIME >sysdate -30 group by trunc(FIRST_TIME,'HH') order by trunc(FIRST_TIME,'HH') desc
- If the problem is still happening, I can use Tanel Poder’s Snapper to find the worse redo generating sessions. Tanel explains how to do this in his blog.
- However, Snapper’s output is very limited by the fact that it was written specifically for situations where you cannot create anything on the DB. Since I’m normally the DBA on the servers I troubleshoot, I have another script that actually gets information from v$session, sorts the results, etc.
create global temporary table redo_stats ( runid varchar2(15), sid number, value int ) on commit preserve rows; truncate table redo_stats; insert into redo_stats select 1,sid,value from v$sesstat ss join v$statname sn on ss.statistic#=sn.statistic# where name='redo size' commit; insert into redo_stats select 2,sid,value from v$sesstat ss join v$statname sn on ss.statistic#=sn.statistic# where name='redo size' commit; select * from redo_stats a, redo_stats b,v$session s where a.sid = b.sid and s.sid=a.sid and a.runid = 1 and b.runid = 2 and (b.value-a.value) > 0 order by (b.value-a.value) - Last technique is normally the most informative, and requires a bit more work than the rest, so I save it for special cases. I’m talking about using logminer to find the offender in the redo logs. This is useful when the problem is no longer happening, but we want to see what was the problem last night. You can do a lot of analysis with the information in logminer, so the key is to dig around and see if you can isolate a problem. I’m just giving a small example here. You can filter log miner data by users, segments, specific transaction ids – the sky is the limit.
-- Here I pick up the relevant redo logs. -- Adjust the where condition to match the times you are interested in. -- I picked the last day. select 'exec sys.dbms_logmnr.add_logfile(''' || name ||''');' from v$archived_log where FIRST_TIME >= sysdate-1 and THREAD#=1 and dest_id=1 order by FIRST_TIME desc -- Use the results of the query above to add the logfiles you are interest in -- Then start the logminer exec sys.dbms_logmnr.add_logfile('/u10/oradata/MYDB01/arch/1_8646_657724194.dbf'); exec sys.dbms_logmnr.start_logmnr(options=>sys.dbms_logmnr.DICT_FROM_ONLINE_CATALOG); -- You can find the top users and segments that generated redo select seg_owner,seg_name,seg_type_name,operation ,min(TIMESTAMP) ,max(TIMESTAMP) ,count(*) from v$logmnr_contents group by seg_owner,seg_name,seg_type_name,operation order by count(*) desc -- You can get more details about the specific actions and the amounts of redo they caused. select LOG_ID,THREAD#,operation, data_obj#,SEG_OWNER,SEG_NAME,TABLE_SPACE,count(*) cnt ,sum(RBABYTE) as RBABYTE ,sum(RBABLK) as RBABLK from v$logmnr_contents group by LOG_ID,THREAD#,operation, data_obj#,SEG_OWNER,SEG_NAME,TABLE_SPACE order by count(*) -- don't forget to close the logminer when you are done exec sys.dbms_logmnr.end_logmnr; -
If you have ASH, you can use it to find the sessions and queries that waited the most for “log file sync” event. I found that this has some correlation with the worse redo generators.
-- find the sessions and queries causing most redo and when it happened select SESSION_ID,user_id,sql_id,round(sample_time,'hh'),count(*) from V$ACTIVE_SESSION_HISTORY where event like 'log file sync' group by SESSION_ID,user_id,sql_id,round(sample_time,'hh') order by count(*) desc -- you can look the the SQL itself by: select * from DBA_HIST_SQLTEXT where sql_id='dwbbdanhf7p4a'
Subscribe
Good article.
I normally create a database logoff trigger that reads v$sesstat and inserts data into a table, and leave it running for a day or so. That usually reveals which session/schema/app is the worst offender.
For reasons I don’t 100% agree with, we avoid creating triggers in our environment at all. Your method sounds useful – as long as the offending session logged off.
To give credit where due, I learnt this trick from the OWI book by Kirtikumar Deshpande et al. In fact they recommend it as a ‘performance data collector’ tool. Since we don’t have ASH licenses, I find this extremely useful. Lots of useful information can be derived from recording v$sesstat and v$session_event.
Nice trick using logminer to diagnose redo growth.
We had similar situation with disk running out of space cause of redo generation. If I knew then this trick with logminer troubleshooting would be much simpler.
Great article!
Regards,
Marko
Chen, nice queries. But I do not agree with 5. A session generating the most redo may not be the one mostly waiting on log file sync. If the session runs a massive DML statement without a commit it never waits for the log file sync event. Right?
Also, you need to be carefull when there are many sessions, each doing a little bit of redo instead of 1 big session.
We had this problem once, when an application went crazy and restarted a certain action again and again (each time creating a session to the db, do some work and logging off again).
I had to use logminer to go through the actions to find out what was happening.
[...] Shapira offers The Senile DBA Guide to Troubleshooting Sudden Growth in Redo Generation, which begins, “I just troubleshooted a server where the amounts of redo generated suddenly [...]
[...] Chen Shapira-The Senile DBA Guide To Troubleshooting Sudden Growth In Redo Generation [...]