The Senile DBA Guide to Troubleshooting Sudden Growth in Redo Generation

Posted: November 4, 2009 | Author: prodlife | Filed under: scripts, tips | 13 Comments

I just troubleshooted a server where the amounts of redo generated suddenly exploded to the point of running out of disk space.

After I was done, the problem was found and the storage manager pacified, I decided to save the queries I used. This is a rather common issue, and the scripts will be useful for the next time.

It was very embarrassing to discover that I actually have 4 similar but not identical scripts for troubleshooting redo explosions. Now I have 5 🙂

Here are the techniques I use:

I determine whether there is really a problem and the times the excessive redo was generated by looking at v$log_history:

select trunc(FIRST_TIME,'HH') ,sum(BLOCKS) BLOCKS , count(*) cnt
,sum(decode(THREAD#,1,BLOCKS,0)) Node1_Blocks
,sum(decode(THREAD#,1,1,0)) Node1_Count
,sum(decode(THREAD#,2,BLOCKS,0)) Node2
,sum(decode(THREAD#,2,1,0)) Node2_Count
,sum(decode(THREAD#,3,BLOCKS,0)) Node3
,sum(decode(THREAD#,3,1,0)) Node3_Count
from v$archived_log
where FIRST_TIME >sysdate -30
group by trunc(FIRST_TIME,'HH')
order by trunc(FIRST_TIME,'HH') desc

If the problem is still happening, I can use Tanel Poder’s Snapper to find the worse redo generating sessions. Tanel explains how to do this in his blog.

However, Snapper’s output is very limited by the fact that it was written specifically for situations where you cannot create anything on the DB. Since I’m normally the DBA on the servers I troubleshoot, I have another script that actually gets information from v$session, sorts the results, etc.

create global temporary table redo_stats
( runid varchar2(15),
  sid number,
  value int )
on commit preserve rows;

truncate table redo_stats;

insert into redo_stats select 1,sid,value from v$sesstat ss
join v$statname sn on ss.statistic#=sn.statistic#
where name='redo size'

commit;

insert into redo_stats select 2,sid,value from v$sesstat ss
join v$statname sn on ss.statistic#=sn.statistic#
where name='redo size'

commit;

select *
        from redo_stats a, redo_stats b,v$session s
       where a.sid = b.sid
       and s.sid=a.sid
         and a.runid = 1
         and b.runid = 2
         and (b.value-a.value) > 0
       order by (b.value-a.value)

Last technique is normally the most informative, and requires a bit more work than the rest, so I save it for special cases. I’m talking about using logminer to find the offender in the redo logs. This is useful when the problem is no longer happening, but we want to see what was the problem last night. You can do a lot of analysis with the information in logminer, so the key is to dig around and see if you can isolate a problem. I’m just giving a small example here. You can filter log miner data by users, segments, specific transaction ids – the sky is the limit.

-- Here I pick up the relevant redo logs.
-- Adjust the where condition to match the times you are interested in.
-- I picked the last day.
select 'exec sys.dbms_logmnr.add_logfile(''' || name ||''');'
from v$archived_log
where FIRST_TIME >= sysdate-1
and THREAD#=1
and dest_id=1
order by FIRST_TIME desc

-- Use the results of the query above to add the logfiles you are interest in
-- Then start the logminer
exec sys.dbms_logmnr.add_logfile('/u10/oradata/MYDB01/arch/1_8646_657724194.dbf');

exec sys.dbms_logmnr.start_logmnr(options=>sys.dbms_logmnr.DICT_FROM_ONLINE_CATALOG);

-- You can find the top users and segments that generated redo
select seg_owner,seg_name,seg_type_name,operation ,min(TIMESTAMP) ,max(TIMESTAMP) ,count(*)
from v$logmnr_contents 
group by seg_owner,seg_name,seg_type_name,operation
order by count(*) desc

-- You can get more details about the specific actions and the amounts of redo they caused.
select LOG_ID,THREAD#,operation, data_obj#,SEG_OWNER,SEG_NAME,TABLE_SPACE,count(*) cnt ,sum(RBABYTE) as RBABYTE ,sum(RBABLK) as RBABLK 
from v$logmnr_contents 
group by LOG_ID,THREAD#,operation, data_obj#,SEG_OWNER,SEG_NAME,TABLE_SPACE
order by count(*)

-- don't forget to close the logminer when you are done
exec sys.dbms_logmnr.end_logmnr;

If you have ASH, you can use it to find the sessions and queries that waited the most for “log file sync” event. I found that this has some correlation with the worse redo generators.

-- find the sessions and queries causing most redo and when it happened

select SESSION_ID,user_id,sql_id,round(sample_time,'hh'),count(*) from V$ACTIVE_SESSION_HISTORY
where event like 'log file sync'
group by  SESSION_ID,user_id,sql_id,round(sample_time,'hh')
order by count(*) desc

-- you can look the the SQL itself by:
select * from DBA_HIST_SQLTEXT
where sql_id='dwbbdanhf7p4a'

Lessons From OOW09 #1 – Shell Script Tips

Posted: October 17, 2009 | Author: prodlife | Filed under: Linux, openworld09, scripts, tips, Uncategorized | 15 Comments

During OpenWorld I went to a session about shell scripting. The speaker, Ray Smith, was excellent. Clear, got the pace right, educating and entertaining.

His presentation was based on the book “The Art of Unix Programming” by one Eric Raymond. He recommended reading it, and I may end up doing that.

The idea is that shell scripts should obey two important rules:

Shell scripts must work
Shell scripts must keep working (even when Oracle takes BDUMP away).

Hard to object to that 🙂

Here’s some of his advice on how to achieve these goals (He had many more tips, these are just the ones I found non-trivial and potentially useful. My comments in italics.)

Document dead ends, the things you tried and did not work, so that the next person to maintain the code won’t try them again.
Document the script purpose in the script header, as well as the input arguments
Be kind – try to make the script easy to read. Use indentation. Its 2009, I’m horrified that “please indent” is still a relevant tip.
Clean up temporary files you will use before trying to use them:
function CleanUpFiles { [ $LOGFILE ] && rm -rf ${LOGFILE} [ $SPOOLFILE ] && rm -rf ${SPOOLFILE} }
Revisit old scripts. Even if they work. Technology changes. This one is very controversial – do we really need to keep chasing the latest technology?
Be nice to the users by working with them – verify before taking actions and keep user informed of what the script is doing at any time. OPatch is a great example.
Error messages should explain errors and advise how to fix them
Same script can work interactively or in cron by using: if [ tty -s ] …
When sending email notifying of success or failure, be complete. Say which host, which job, what happened, how to troubleshoot, when is the next run (or what is the schedule).
Dialog/Zenity – tools that let you easily create cool dialog screens
Never hardcode passwords, hostname, DB name, path. Use ORATAB, command line arguments or parameter files.I felt like clapping here. This is so obvious, yet we are now running a major project to modify all scripts to be like that.
Be consistent – try to use same scripts whenever possible and limit editing permissions
Use version control for your scripts. Getting our team to use version control was one of my major projects this year.
Subversion has HTTP access, so the internal knowledge base can point at the scripts. Wish I knew that last year.
Use deployment control tool like CFEngine. I should definitely check this one out.
Use getopts for parameters. Getopts looked to complicated when I first checked it out, but I should give it another try.
Create everything you need every time you need it. Don’t fail just because a directory does not exist. Output what you just did.
You can have common data files with things like hosts list or DB lists that are collected automatically on regular basis and that you can then reference in your scripts.
You can put comments and descriptions in ORATAB

Preparing to Clone!

Posted: August 17, 2009 | Author: prodlife | Filed under: scripts | 4 Comments

We are moving to this new data center. The new servers are in the new data centers, just waiting for Oracle to be installed on them. We have about 50 new servers to install.

Obviously we want to install them as quickly as possible, and taking as little human-work time as possible. To achieve this, we are checking to options – scripted silent installation and installation cloning.

I’m checking the installation cloning part, and I’m using Oracle’s Universal Installer User Guide for the process. One of the first steps in the process is:

At the source, you run a script called prepare_clone.pl.
This is a Perl script that prepares the source for cloning by recording the information that is required for cloning. This is generally located in the following location: $ORACLE_HOME/clone/bin/prepare_clone.pl.

When I tried to run the script, I found out that oracle user did not have execute permissions on the file. Then I found out that the script had the location of PERL hardcoded to the wrong location. Finally, I found out that the file had the usual amount of comments for an Oracle script, but only one line of code:
exit 0;

I found this incredibly amusing, so I decided to blog on this. While blogging, I took a closer look at the documentation, and found the following comment:

The need to perform the preparation phase depends on the Oracle product that you are installing. This script needs to be executed only for the Application Server Cloning. Database and CRS Oracle home Cloning does not need this.

I guess the joke is on me. Serves me right for not reading the instructions carefully.

Group By in Shell

Posted: April 22, 2009 | Author: prodlife | Filed under: Linux, scripts, tips | 19 Comments

Storage manager dropped by my cube to say that redo log archives for one of our DBs grew from 40G to 200G in the last day. Running “du -Sh” few times showed that the files are no longer getting written as fast, so the issue is probably over. But what was it? And to start, when was it? Once I figure out when the issue occure, I may be able to find the offending DML in ASH.

So, I need to do something like:

select day,hour,sum(bytes)
from (ls -l)
group by day,hour
order by day,hour

I guess I could have used the output of “ls -l” as an external table and actually run this query. But I’m actually trying to improve my shell scripting skills. There must be some way to do it!

I asked on Twitter, and Kris Rice (of SQLDeveloper fame) pointed me in the right direction – use AWK.

Here’s the result:

ls -l | sed ‘s/:/ /g’ | awk ‘{sizes[$7$8] += int($5)} END {for (hour in sizes)
{ printf(“At %s %d bytes were inserted\n”,hour,sizes[hour])}}’ | sort

I run ls. Use sed to seperate the time into fields (Laurent Schneider will kill me, because I’m mixing sed and awk, but I’m not it his level yet…). After the split, the fifth field contains the file size, the seventh contains the date and the eighth is the hour. So I have an associative array called “sizes”, indexed on the date and hour, and each bucket contains the sum of the file sizes in this hour.

Then I loop on the array and print the results, and sorted them to make it more readable.

A long post for a short script, but I want to make sure I don’t forget it 😉

Anyway, the best part by far was everyone who tried to help my desperate call in Twitter/Facebook. It was unexpected and very very nice. About 5 people sent tips, ideas and even solutions! I only imagine that doing my work is more fun than doing their own work, but I appreciate the help a lot.

Fork() on Windows.

Posted: December 19, 2008 | Author: prodlife | Filed under: scripts | 8 Comments

I was assigned a very simple task – write a script that opens SSH connection to a remote server, waits 30 seconds and closes the connection. About 5 minutes of work on Linux.

On windows it took an entire day.

First attempt: Using MKS32’s KSH.
d:/sitescope/tools/plink sitescope@testdb -pw pass -m d:/sshTest_cmd2; SSH_PID=$! echo $SSH_PID (sleep 30 ; kill $SSH_PID);
Problem: The ssh process executed within a subshell, and I only got the PID for the subshell. Killing the subshell did not kill the SSH process.

Second attempt: Using Perl’s fork, exec and wait.

$pid = fork;
if($pid == 0) {
    exec("d:/sitescope/tools/plink sitescope@testdb -pw pass -m d:/sshTest_cmd2");
}elsif ($pid > 0) {
    sleep 30;
    kill(9,$pid);
} else {
    print "fork failed";
};

Problem: Under windows, fork is not a real fork. It creates a “pseudo process” (AKA thread), and $PID is a negative number. Not a real process ID. After the “exec”, the child thread will become a real process with PID, but again, I’ll not have the PID.

Third attempt: Perl has a WIN32 package, which includes a process object, allowing me to create a process, wait for it, and kill it.

use Win32;
use Win32::Process;
use Win32::Process (STILL_ACTIVE);

sub ErrorReport{
    print Win32::FormatMessage( Win32::GetLastError() );
}

$params = "plink sitescope\@$ARGV[0] -pw $ARGV[1] -m D:\\sshTest_cmd";
print "$params \n";

Win32::Process::Create($ProcessObj,"D:\\SiteScope\\tools\\plink.exe",$params,0,
NORMAL_PRIORITY_CLASS,".")|| die ErrorReport();

$PID = $ProcessObj->GetProcessID();
print "$PID \n";

$ProcessObj->Wait(30000);
$ProcessObj->GetExitCode($exitcode);

print "exit code is: $exitcode \n";

if ($exitcode == 0)
{
    print "exiting normally \n";
exit 0;
} elsif ($exitcode == STILL_ACTIVE)
{
    print "killing ssh \n";
    $ProcessObj->Kill(59);
    exit 59;
} else
{
    print "ssh exited abnormally \n";
    exit $exitcode;
}

Finally something works 🙂

Read Only User

Posted: August 14, 2008 | Author: prodlife | Filed under: plsql, scripts, tips | 3 Comments

“Oh mighty DBA” said the developer, “You are wise and very attractive. Can I please have access to the TEST system of my application? I need to debug an issue that QA found and I cannot reproduce”.

(I’m paraphrasing a bit).

“No way. You will change few things while you are there and ruin the test cycle. Even if you won’t, QA will think you did and this will ruin the test cycle.”

“I see your point, wise and attractive DBA. In this case, can I have just read-only access to the system?”

(I hate this part in the conversation. When a request sounds innocent and trivial, but is in-fact complicated and full of trickery.)

“I’ll see what I can do. But you don’t get to execute any functions. Just select on table and views. And you’ll need to prefix the table names with the name of the original user.”

(Well, actually giving synonyms is easy enough. If he’ll insist, I’ll add them).

I wrote the function that does it. It is horrible in several ways:

It uses “execute immediate”.
No bind variables. I don’t think I can use them in this case, but it still means I’m doing bad things to the library cache.
I add “with grant” privilege to all object privileges the original user had. I did not want to touch the original user, but as I explained before, I had no choice. I could have been more selective here, though.

I hope it will be useful, but I post it mostly as a rough reference. Think twice before you use it.

CREATE OR REPLACE
PROCEDURE GRANT_READ_ONLY(RO_USER in varchar2, ORIG_USER in varchar2,with_grant in BOOLEAN 
default true) AUTHID current_user AS stmt varchar2(4000);
get_grant_objs varchar2(4000);
TYPE cur_typ IS REF CURSOR;
c_cursor cur_typ;
Type user_priv_typ is record (
    table_name all_tab_privs.table_name%type,
    privilege all_tab_privs.privilege%type,
    grantor all_tab_privs.grantor%type);
user_priv_rec user_priv_typ;

BEGIN
  if (with_grant) then
      get_grant_objs := 'select table_name,privilege,grantor from dba_TAB_PRIVS dtp
          join dba_objects do on dtp.grantor=do.owner and dtp.table_name=do.object_name
          where grantee=''' || ORIG_USER || '''  order by TABLE_NAME';
      open c_cursor for get_grant_objs;
      LOOP
        FETCH c_cursor into user_priv_rec;
        EXIT WHEN c_cursor%NOTFOUND;
        stmt := 'GRANT ' || user_priv_rec.privilege || ' ON ' || user_priv_rec.grantor || '.' || 
        user_priv_rec.table_name || ' TO ' || ORIG_USER || ' WITH GRANT OPTION'; 
        execute immediate stmt;
        --dbms_output.put_line(stmt);
  END LOOP;
  CLOSE c_cursor;
  end if;
  for obj in 
  (
  select object_name from all_objects
  where owner=ORIG_USER
  and object_type in ('SEQUENCE','TABLE','VIEW','MATERIALIZED VIEW'))
  LOOP
    stmt := 'GRANT SELECT ON ' || ORIG_USER || '.' || obj.object_name || ' TO ' || RO_USER;
    execute immediate stmt;
    --dbms_output.put_line(stmt);
  END LOOP;
END;

Joining Unrelated Data

Posted: June 3, 2008 | Author: prodlife | Filed under: scripts, tips | 2 Comments

I’m sure this happened to every DBA – an important customer sends an Excel spreadsheet and explains that he wants this report generated from the DB on a weekly basis.

At first look the report looks easy enough and you say “Sure, you’ll have it ready by the end of the week”, but on second look something seems off and you give the customer a call “Listen, it seems that the data in the columns is not really related. First column contains your customers and the second contains the contact emails for each customer, which is fine. But the third column contains the open issues for the customer , and it is not really related to the contact emails column.”
And the customer calmly replies “You are right, it isn’t related. Thats how I want the report. I’ll have it ready by Friday, yes?”

No choice but to write a non-relational join. I do this by generating a row id column and joining on that.
For example:

SQL> create table customers (cust_id number, cust_name varchar(20));

Table created.

SQL> insert into customers values (1,'cust A');

1 row created.

SQL> insert into customers values (2,'cust B');

1 row created.

SQL> insert into customers values (3,'cust C');

1 row created.

SQL> create table contacts (cust_id number, email varchar(64));

Table created.

SQL> insert into contacts values (1,'a1@a.com');

1 row created.

SQL> insert into contacts values (1,'a2@a.com');

1 row created.

SQL> insert into contacts values (2,'b1@b.com');

1 row created.

SQL> insert into contacts values (3,'c1@c.com');

1 row created.

SQL> insert into contacts values (3,'c2@c.com');

1 row created.

SQL> insert into contacts values (3,'c3@c.com');

1 row created.

SQL> create table issues (cust_id number, description varchar(4000));

Table created.

SQL> insert into issues values (1,'Late delivery');

1 row created.

SQL> insert into issues values (1,'Missing components');

1 row created.

SQL> insert into issues values (2,'Wrong address');

1 row created.

SQL> insert into issues values (3,'Malfunctions');

1 row created.

SQL> col email format a10
SQL> col description format a20
SQL> select nvl(t1.cust_name,t2.cust_name),email,description from
  2  (select dense_rank() over (partition by cust_name order by email) n, cust_n
ame,email
  3  from customers cr join contacts cn on cr.cust_id=cn.cust_id) t1
  4  full outer join
  5  (select dense_rank() over (partition by cust_name order by description) n,
cust_name,description
  6  from customers cr join issues i on cr.cust_id=i.cust_id) t2
  7  on t1.cust_name=t2.cust_name and t1.n=t2.n;

NVL(T1.CUST_NAME,T2. EMAIL      DESCRIPTION
-------------------- ---------- --------------------
cust A               a1@a.com   Late delivery
cust A               a2@a.com   Missing components
cust B               b1@b.com   Wrong address
cust C               c1@c.com   Malfunctions
cust C               c2@c.com
cust C               c3@c.com

6 rows selected.

Very ugly. I hope the customer will be happy with his report.

Streams – Part II

Posted: March 6, 2008 | Author: prodlife | Filed under: scripts, streams, tips | Leave a comment

Two weeks after the last streams post, and I need to give some updates and corrections.

1) I’ve written that:

“And you should also check which archived logs can be deleted:
SELECT r.CONSUMER_NAME, r.NAME, r.FIRST_SCN, r.NEXT_SCN, r.PURGEABLE FROM DBA_REGISTERED_ARCHIVED_LOG r, DBA_CAPTURE c WHERE r.CONSUMER_NAME = c.CAPTURE_NAME;"

Actually, that is not a good way to check for which archive logs can be deleted. According to Metalink note 373066.1 RMAN use dba_capture.required_checkpoint_scn to determine which logs can be purged. So I suggest to use this when doing managed backups as well:
SELECT r.CONSUMER_NAME, r.NAME, r.FIRST_SCN, r.NEXT_SCN, case when (r.next_scn > c.required_checkpoint_scn) then 'NO' else 'YES' end purgable FROM DBA_REGISTERED_ARCHIVED_LOG r, DBA_CAPTURE c WHERE r.CONSUMER_NAME = c.CAPTURE_NAME;

2) I’ve made some severe mistakes when trying to extend the streams implementation by replicating few more tables. I decided to start from scratch by running: dbms_streams_admin.remove_streams_configuration.

This does not work. The solution was found in Morgan’s Library:
-- the following provided by Michael M. Brennan from RSA Security who -- reports that REMOVE_STREAMS_CONFIGURATION does not work unless the -- propagation has been dropped first. SELECT propagation_name FROM dba_propagation; exec dbms_propagation_adm.drop_propagation('PROPAGATION$_143'); exec dbms_streams_adm.remove_streams_configuration;

3) Unfortunately, when I tried to start a new capture process, I find myself with the following unexpected problem:
SQL> select required_checkpoint_scn from dba_capture;

REQUIRED_CHECKPOINT_SCN ----------------------- 0

Not good. This means that my capture process is looking at the archive logs for scn 0. I have no control over required_checkpoint_scn and have no clue how this happened. Oracle support is still researching that.

But this means that I can refer you to two metalink articles that can help with debug (or at least with opening SRs on streams related issues):

Note 313279.1 “Capture Process Looping In Initializing/Dictionary Initialization State” shows how to start a trace against the capture process and how to interpret the result trace file.
Note 273674.1 “Streams Configuration Report and Health Check Script” contains cute scripts that summarize all the the streams configuration into a nice HTML page. I didn’t find the script very useful for my debugging, but it is absolutely mandatory when opening a streams related SR. They will not talk to you before you sent them the result of the script, and they’ll ask you to run it again whenever they run out of ideas on what to say next.

Oracle Streams Replication Example

Posted: February 21, 2008 | Author: prodlife | Filed under: scripts, streams, tips | 31 Comments

I promised to show off our streams implementation, because I found so few examples out there. Keep in mind that your streams implementation will probably look nothing like mine at all, and even if it will look like mine, you can get the same results in very different way. This example is not an excuse to avoid reading the docs.

Our goal: Set up a reporting server, with replication of schemas arriving from many source DBs. We also want to filter out and not replicate some of the less critical tables.

Decisions, decisions: When implementing streams there are several things you want to ask yourself: Do you want bi-directional replications? Do you want to capture the changes on the source DB, destination DB or a third server? Do you do the instantiation (first copy of data) with DataPump or RMan?

We went with one-directional replication, capturing will be done on the destination server, and we’ll use DataPump to initially move the schemas to the destination.

Create users: You want to create a user both on source and destination server that will own all the streams related objects. Also, you want to give him his own tablespace (or at least keep his objects away from SYSTEM tablespace) because in case of errors lots of data can accumulate in stream queues. The docs recommend making the tablespace autoextend, but I’m very much against it. After creating the strmadmin user, the rest of my instructions should be run from this user.

CREATE TABLESPACE streams_tbs DATAFILE '/data/oradata/DB05/streams_tbs.dbf' SIZE 25M;

CREATE USER strmadmin IDENTIFIED BY strmadminpw DEFAULT TABLESPACE streams_tbs QUOTA UNLIMITED ON streams_tbs;

GRANT DBA TO strmadmin;

BEGIN DBMS_STREAMS_AUTH.GRANT_ADMIN_PRIVILEGE( grantee => 'strmadmin', grant_privileges => true); END;

DBLinks: You will need a DBLink from source to destination DB. For convenience, I created a DBLink from destination to source which I’ll use during the configuration of Streams and will drop afterwards. Of course, the servers should be able to tnsping each other before this will work.

--on source CONNECT strmadmin/strmadminpw; CREATE DATABASE LINK REP01 CONNECT TO strmadmin IDENTIFIED BY strmadminpw USING 'REP01'; select count(*) from dba_tables@REP01;

-- on target CONNECT strmadmin/strmadminpw; CREATE DATABASE LINK DB05 CONNECT TO strmadmin IDENTIFIED BY strmadminpw USING 'DB05'; select count(*) from dba_tables@DB05;

Directory Objects: I decided to use Datapump for instantiation and to use DBMS_STREAMS to create a configuration scripts. This means I needed two directory objects on destination (for the script and Datapump) and one of the source (just Datapump). I used existing directories, so I had no need to create them, but in general you create a directory like so:

create directory EXP_DP_VOL as '/exp_dp';

Copying Archive redo logs: You need to get the archived redo logs from the source server to the server doing the capturing (destination server in my case, but can be a 3rd server). There are two decisions to make here: do you use ARCH or LGWR to write the log files to the remote location? And will the new archiving destination be MANDATORY or OPTIONAL (i.e. does the writing have to succeed before the corresponding redo file at the source can be overwritten)?
Also, when specifying the destination directory, you want it to be a separate directory from where REP01 redo logs are archived, and make extra sure that the directory really exists as you wrote it.
I specified a non-existent directory, this resulted in errors on the alert log about failure to archive (scary!), and after few retries the archiver stopped attempting to write to the new destination. So, after creating the directory I had to reset the log_archive_dest_2 parameter (i.e. set it to ” and then change it back again).

On the source server, we run:

ALTER SYSTEM set LOG_ARCHIVE_DEST_2='SERVICE=REP01 ARCH OPTIONAL NOREGISTER TEMPLATE=/u10/oradata/DB05_arch/DB05_arch_%t_%s_%r.redo'; ALTER SYSTEM set LOG_ARCHIVE_STATE_2=ENABLE;

Create instantiation and configuration script: Since we do replication of (almost) entire schemas, we could use DBMS_STREAMS_ADM.MAINTAIN_SCHEMAS to do both the instantiation and the configuration of Streams.
I used the option that creates a script with the instructions (instead of run everything). I had many problems with the script and ended up using only parts of it, and other parts doing in different ways. I’m not sure if it means that using a script is a bad idea (since I had so many problems) or a good idea (since the same problems would be more annoying if I did not have the script). To create the script, I ran the following command on the destination database. If you are using a 3rd server for capture, run it on the 3rd server:

BEGIN DBMS_STREAMS_ADM.MAINTAIN_SCHEMAS( schema_names => 'DEV110', source_directory_object => 'EXPORTS_DIR', destination_directory_object => 'EXP_DP_VOL', source_database => 'DB05', destination_database => 'REP01', perform_actions => false, script_name => 'src_pre_inst_streams.sql', script_directory_object => 'EXP_DP_VOL', dump_file_name => 'dev110.dmp', capture_queue_table => 'rep_capture_queue_table', capture_queue_name => 'rep_capture_queue', capture_queue_user => NULL, apply_queue_table => 'rep_dest_queue_table', apply_queue_name => 'rep_dest_queue', apply_queue_user => NULL, capture_name => 'capture_110dev', propagation_name => 'prop_110dev', apply_name => 'apply_110dev', log_file => 'export_110dev.clg', include_ddl => true, instantiation => DBMS_STREAMS_ADM.INSTANTIATION_SCHEMA); END;

Running the result script: Try running the script you just created on the same DB where you ran the procedure to create it. Note that it asks you for connection details of three sites. The first should be source, the second the destination, and the third can be garbage as it is not used in the script at all.

If the stars are aligned right, maybe you run this script to the finish and now have a working schema replication. In my case, it ran fine until the Datapump part, and created all the capture processes and queues. However, the Datapump export failed with very unusual errors. I spent two days trying to get it to work, and then given up and simply used Datapump manually to export the schema from source and import into destination. After that I continued running the rest of the script, and things were fine.

Fun and games: Now that you have your own schema replication. It is time to enjoy it. Create new tables, create as select, update data, insert new data, delete data, drop tables, create procedures and types. Everything replicates!

Just remember that if you don’t commit, it won’t replicate. Reasons should be obvious. Also, if you are testing on low-volume system, you probably want to alter system switch logfile every few minutes so you can see your changes faster.

Watch the replication process: While you are replicating changes, you probably want to watch how they move from source to destination:

You can see when the last message was read from redo archive:
SELECT CAPTURE_NAME, LOGMINER_ID, AVAILABLE_MESSAGE_NUMBER, TO_CHAR(AVAILABLE_MESSAGE_CREATE_TIME, 'HH24:MI:SS MM/DD/YY') AVAILABLE_MESSAGE_CREATE_TIME FROM V$STREAMS_CAPTURE;

You can see when the changes were entered into the capture queue:
SELECT CAPTURE_NAME, (ENQUEUE_TIME-ENQUEUE_MESSAGE_CREATE_TIME)*86400 LATENCY_SECONDS, TO_CHAR(ENQUEUE_MESSAGE_CREATE_TIME, 'HH24:MI:SS MM/DD/YY') CREATE_TIME, TO_CHAR(ENQUEUE_TIME, 'HH24:MI:SS MM/DD/YY') ENQUEUE_TIME, ENQUEUE_MESSAGE_NUMBER FROM V$STREAMS_CAPTURE;

And you can see when the apply process started working on them:
SELECT APPLY_NAME, (DEQUEUE_TIME-DEQUEUED_MESSAGE_CREATE_TIME)*86400 LATENCY, TO_CHAR(DEQUEUED_MESSAGE_CREATE_TIME,'HH24:MI:SS MM/DD/YY') CREATION, TO_CHAR(DEQUEUE_TIME,'HH24:MI:SS MM/DD/YY') LAST_DEQUEUE, DEQUEUED_MESSAGE_NUMBER FROM V$STREAMS_APPLY_READER;

Another nice thing to try is to generate errors. For example, you can delete a table from the destination, make a change to it in the source, and see how it doesn’t replicate. The error will be found here:
SELECT APPLY_NAME, SOURCE_DATABASE, LOCAL_TRANSACTION_ID, ERROR_NUMBER, ERROR_MESSAGE, MESSAGE_COUNT FROM DBA_APPLY_ERROR;

And you should also check which archived logs can be deleted:
SELECT r.CONSUMER_NAME, r.NAME, r.FIRST_SCN, r.NEXT_SCN, r.PURGEABLE FROM DBA_REGISTERED_ARCHIVED_LOG r, DBA_CAPTURE c WHERE r.CONSUMER_NAME = c.CAPTURE_NAME;

Excluding Tables:
The last thing I did was filter out the tables I did not need to replicate. Those are tables the application uses as queues or temps – tons of inserts and deletes, but the data is useless for reports.
I used the DBMS_STREAMS_ADM.ADD_TABLE_RULES procedure. I selected to do the filtering during the capture (you can also do it during apply).
The important bits: table_name MUST contain the schema name in it. Otherwise the procedure will assume that the table belongs to strmadmin, and nothing will get filtered.
inclusion_rule=>FALSE is the part that indicates that we don’t want this table.

begin DBMS_STREAMS_ADM.ADD_TABLE_RULES( table_name=>'DEV110.APP_QUEUE', streams_type=>'capture', streams_name=>'CAPTURE_110DEV', queue_name=>'REP_CAPTURE_QUEUE', include_dml=>TRUE, include_ddl=>TRUE, include_tagged_lcr=> FALSE, source_database=>'DB05', inclusion_rule=>FALSE, and_condition=>NULL); end;

You can then see the rule you created by running select * from DBA_RULES

What now: The process I just described got me safely past a POC. There is obviously lot more to be done before this solution goes production. The performance of the capturing, and what is the impact of lags is of concern. Also, the entire system needs to be recreated from scratch whenever we do “alter database open reset logs” on the source DB (hopefully not often). But the immediate next step for me is to prepare a nice presentation to management showing what a great solution we prepared and how useful it will be for the business and how much this will be worth the investment.

alter table … shrink space

Posted: December 6, 2007 | Author: prodlife | Filed under: scripts | 14 Comments

A customer deleted 30 million rows from a table and now wants to shrink the table. This sent me doing some research about online shrinking of tables. Not the type of research that you get Noble prize for, but it may interest someone.

The idea behind shrinking is that if you delete half the rows in the table, you’ll end up with many partially empty blocks below the high water mark. If you are planning to grow the table again, this is not a problem, the space will get used eventually. However, if you are not planning to insert many rows to this table, you may want to reuse these blocks for other tables in the same tablespace. Also, full table scans go over all the blocks below the high water mark , so if many of these blocks are empty, you are doing more IO than you need, which leads to slower table scans.

So, we decided that shrinking the table is desirable, but how do we do it? Before 10g, shrinking was an ordeal which involved exporting the table, dropping it and importing it back. That causes a downtime and therefore was rarely done.
In 10g it is as easy as:
alter table t1 enable row movement ; alter table t1 shrink space cascade; alter table t1 disable row movement ;

Behind the scenes Oracle will do a bunch of deletes and inserts that will move the existing rows into the empty spaces, thus shrinking the table. Note that this generates redo just like any other DML operation.

If you run ADDM, the segment advisor will tell you which tables can benefit from shrinking. You can read the advice from the latest run by running the following query:

select * from table(dbms_space.asa_recommendations('FALSE', 'FALSE', 'FALSE')) order by reclaimable_space desc

I tried shrinking a table on my test machine, just to make sure the process is clear:

create table t1 (x number);

begin for i in 1..1000000 loop insert into t1 values (i); end loop; commit; end;

select (count(*)*52428800)/1024/1024 from v$log_history --> 1050 (it was 800 before the inserts)

delete from t1 where mod(x,2)=0 --deleting half the rows in the table
select (count(*)*52428800)/1024/1024 from v$log_history --> 1200

select count(*) from t1;

--auto trace shows: consistent gets 1640

alter table t1 enable row movement ; alter table t1 shrink space cascade; alter table t1 disable row movement ;

select (count(*)*52428800)/1024/1024 from v$log_history --> 1300
select count(*) from t1;
--auto trace shows: consistent gets 837

Just a simple Hadoop DBA

Adventures with Data and Massively Parallel Databases