I'm just a simple DBA on a complex production system

Writing about all things production. Especially Oracle databases.

Fun with UNDO tablespaces February 23, 2008

Filed under: maintenance, tips — prodlife @ 3:08 am

Its been a long time since I last wrote about how things went very wrong when doing something very routine. Thankfully, this time it was on the test system.

It all started when our storage manager complained that our test system is taking too much space. Since it is important to keep a good relation with the storage manager, I agreed to take a look.
cd \u05\oradata\STG ls -lSh
and then, in a different window:
select * from dba_data_files

Turned out that we have two undo tablespaces, taking up over 50G together. Obviously, one of them is not in use and can be dropped. Easy win.
Started by checking which one is in use:
SQL> show parameter undo

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
undo_management string AUTO
undo_retention integer 900
undo_tablespace string UNDOTBS1

And try to drop the other:
SQL> drop tablespace UNDOTBS2 including contents and datafiles;
drop tablespace UNDOTBS2 including contents and datafiles
*
ERROR at line 1:
ORA-30013: undo tablespace 'UNDOTBS2' is currently in use

Strange, right? But no problem, I’ll connect to the DB machine, shutdown the test system, startup in restricted mode and then drop it. (Ignoring strange things is the first step on the road to hell!)


ssh dbstg02
sqlplus "/ as sysdba"
....
shutdown immediate
....
startup restrict
....
SQL> show parameter undo

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
undo_management string AUTO
undo_retention integer 900
undo_tablespace string UNDOTBS2

Looks like now we are using the other UNDO? strange, but we won’t let that stop us, right? (mistake #2).

ALTER SYSTEM SET undo_tablespace='UNDOTBS1';
drop tablespace UNDOTBS2 including contents and datafiles;

Deleted file /u05/oradata/STG/undotbs02.dbf
Completed: drop tablespace UNDOTBS2 including contents and datafiles
shutdown immediate
startup

ORA-30012: undo tablespace 'UNDOTBS2' does not exist or of wrong type
ORA-1092 signalled during: ALTER DATABASE OPEN...

That was unexpected, but maybe for some reason the spfile was not written? lets startup nomount and switch to the right undo (Ignoring the 3rd sign that something went wrong, this time too late):

SQL> startup nomount
ORACLE instance started.
...
SQL> show parameter undo

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
undo_management string AUTO
undo_retention integer 900
undo_tablespace string UNDOTBS2
SQL> alter system set undo_tablespace=UNDOTBS1;
alter system set undo_tablespace=UNDOTBS1
*
ERROR at line 1:
ORA-02097: parameter cannot be modified because specified value is invalid
ORA-30012: undo tablespace 'UNDOTBS1' does not exist or of wrong type

SQL> select * from v$tablespace;

TS# NAME INC BIG FLA ENC
---------- ------------------------------ --- --- --- ---
0 SYSTEM YES NO YES
1 UNDOTBS1 YES NO YES

What? What do you mean “does not exist”? When in trouble or in doubt, use pfiles.

SQL> create pfile='/tmp/chen1.ini' from spfile;
File created.

And in the pfile I see…

STG1.undo_tablespace='UNDOTBS1'
STG2.undo_tablespace='UNDOTBS2'
*.undo_tablespace='UNDOTBS1'

Oops! I completely forgot that I’m maintaining a RAC system, and that “Oracle RAC automatically manages undo segments within a specific undo tablespace that is assigned to an instance. ” Each RAC node should have its own undo tablespace!

Fixing the mess:
I edited the pfile to show only the one undo tablespace I have and started one node.
Once I had a running database, I could recreate the missing tablespace:
create undo tablespace UNDOTBS2 datafile '/u05/oradata/STG/undotbs2_01.dbf' size 2G;
And assign each instance its own tablespace once again:
alter system set undo_tablespace=UNDOTBS1 SID='STG1';
alter system set undo_tablespace=UNDOTBS2 SID='STG2';

Of course, messing up a system like that should only be done by trained professionals. Kids, don’t try this at home!

*****
Messing up my own system is not nearly fun enough. Immediately after OOW2007, I started a habit of telling people how to mess up their system through the OTN forum. I’m happy to say that I messed up enough of these to get that nice “active member” gold icon next to my name :-)

 

Oracle Streams Replication Example February 21, 2008

Filed under: scripts, streams, tips — prodlife @ 4:04 am

I promised to show off our streams implementation, because I found so few examples out there. Keep in mind that your streams implementation will probably look nothing like mine at all, and even if it will look like mine, you can get the same results in very different way. This example is not an excuse to avoid reading the docs.

Our goal: Set up a reporting server, with replication of schemas arriving from many source DBs. We also want to filter out and not replicate some of the less critical tables.

Decisions, decisions: When implementing streams there are several things you want to ask yourself: Do you want bi-directional replications? Do you want to capture the changes on the source DB, destination DB or a third server? Do you do the instantiation (first copy of data) with DataPump or RMan?

We went with one-directional replication, capturing will be done on the destination server, and we’ll use DataPump to initially move the schemas to the destination.

Create users: You want to create a user both on source and destination server that will own all the streams related objects. Also, you want to give him his own tablespace (or at least keep his objects away from SYSTEM tablespace) because in case of errors lots of data can accumulate in stream queues. The docs recommend making the tablespace autoextend, but I’m very much against it. After creating the strmadmin user, the rest of my instructions should be run from this user.

CREATE TABLESPACE streams_tbs DATAFILE '/data/oradata/DB05/streams_tbs.dbf' SIZE 25M;

CREATE USER strmadmin
IDENTIFIED BY strmadminpw
DEFAULT TABLESPACE streams_tbs
QUOTA UNLIMITED ON streams_tbs;

GRANT DBA TO strmadmin;

BEGIN
DBMS_STREAMS_AUTH.GRANT_ADMIN_PRIVILEGE(
grantee => 'strmadmin',
grant_privileges => true);
END;

DBLinks: You will need a DBLink from source to destination DB. For convenience, I created a DBLink from destination to source which I’ll use during the configuration of Streams and will drop afterwards. Of course, the servers should be able to tnsping each other before this will work.


--on source
CONNECT strmadmin/strmadminpw;
CREATE DATABASE LINK REP01 CONNECT TO strmadmin IDENTIFIED BY strmadminpw
USING 'REP01';
select count(*) from dba_tables@REP01;

-- on target
CONNECT strmadmin/strmadminpw;
CREATE DATABASE LINK DB05 CONNECT TO strmadmin IDENTIFIED BY strmadminpw
USING 'DB05';
select count(*) from dba_tables@DB05;

Directory Objects: I decided to use Datapump for instantiation and to use DBMS_STREAMS to create a configuration scripts. This means I needed two directory objects on destination (for the script and Datapump) and one of the source (just Datapump). I used existing directories, so I had no need to create them, but in general you create a directory like so:

create directory EXP_DP_VOL as '/exp_dp';

Copying Archive redo logs: You need to get the archived redo logs from the source server to the server doing the capturing (destination server in my case, but can be a 3rd server). There are two decisions to make here: do you use ARCH or LGWR to write the log files to the remote location? And will the new archiving destination be MANDATORY or OPTIONAL (i.e. does the writing have to succeed before the corresponding redo file at the source can be overwritten)?
Also, when specifying the destination directory, you want it to be a separate directory from where REP01 redo logs are archived, and make extra sure that the directory really exists as you wrote it.
I specified a non-existent directory, this resulted in errors on the alert log about failure to archive (scary!), and after few retries the archiver stopped attempting to write to the new destination. So, after creating the directory I had to reset the log_archive_dest_2 parameter (i.e. set it to ” and then change it back again).

On the source server, we run:

ALTER SYSTEM set LOG_ARCHIVE_DEST_2='SERVICE=REP01 ARCH OPTIONAL NOREGISTER TEMPLATE=/u10/oradata/DB05_arch/DB05_arch_%t_%s_%r.redo';
ALTER SYSTEM set LOG_ARCHIVE_STATE_2=ENABLE;

Create instantiation and configuration script: Since we do replication of (almost) entire schemas, we could use DBMS_STREAMS_ADM.MAINTAIN_SCHEMAS to do both the instantiation and the configuration of Streams.
I used the option that creates a script with the instructions (instead of run everything). I had many problems with the script and ended up using only parts of it, and other parts doing in different ways. I’m not sure if it means that using a script is a bad idea (since I had so many problems) or a good idea (since the same problems would be more annoying if I did not have the script). To create the script, I ran the following command on the destination database. If you are using a 3rd server for capture, run it on the 3rd server:

BEGIN
DBMS_STREAMS_ADM.MAINTAIN_SCHEMAS(
schema_names => 'DEV110',
source_directory_object => 'EXPORTS_DIR',
destination_directory_object => 'EXP_DP_VOL',
source_database => 'DB05',
destination_database => 'REP01',
perform_actions => false,
script_name => 'src_pre_inst_streams.sql',
script_directory_object => 'EXP_DP_VOL',
dump_file_name => 'dev110.dmp',
capture_queue_table => 'rep_capture_queue_table',
capture_queue_name => 'rep_capture_queue',
capture_queue_user => NULL,
apply_queue_table => 'rep_dest_queue_table',
apply_queue_name => 'rep_dest_queue',
apply_queue_user => NULL,
capture_name => 'capture_110dev',
propagation_name => 'prop_110dev',
apply_name => 'apply_110dev',
log_file => 'export_110dev.clg',
include_ddl => true,
instantiation => DBMS_STREAMS_ADM.INSTANTIATION_SCHEMA);
END;

Running the result script: Try running the script you just created on the same DB where you ran the procedure to create it. Note that it asks you for connection details of three sites. The first should be source, the second the destination, and the third can be garbage as it is not used in the script at all.

If the stars are aligned right, maybe you run this script to the finish and now have a working schema replication. In my case, it ran fine until the Datapump part, and created all the capture processes and queues. However, the Datapump export failed with very unusual errors. I spent two days trying to get it to work, and then given up and simply used Datapump manually to export the schema from source and import into destination. After that I continued running the rest of the script, and things were fine.

Fun and games: Now that you have your own schema replication. It is time to enjoy it. Create new tables, create as select, update data, insert new data, delete data, drop tables, create procedures and types. Everything replicates!

Just remember that if you don’t commit, it won’t replicate. Reasons should be obvious. Also, if you are testing on low-volume system, you probably want to alter system switch logfile every few minutes so you can see your changes faster.

Watch the replication process: While you are replicating changes, you probably want to watch how they move from source to destination:

You can see when the last message was read from redo archive:
SELECT CAPTURE_NAME,
LOGMINER_ID,
AVAILABLE_MESSAGE_NUMBER,
TO_CHAR(AVAILABLE_MESSAGE_CREATE_TIME, 'HH24:MI:SS MM/DD/YY')
AVAILABLE_MESSAGE_CREATE_TIME
FROM V$STREAMS_CAPTURE;

You can see when the changes were entered into the capture queue:
SELECT CAPTURE_NAME,
(ENQUEUE_TIME-ENQUEUE_MESSAGE_CREATE_TIME)*86400 LATENCY_SECONDS,
TO_CHAR(ENQUEUE_MESSAGE_CREATE_TIME, 'HH24:MI:SS MM/DD/YY') CREATE_TIME,
TO_CHAR(ENQUEUE_TIME, 'HH24:MI:SS MM/DD/YY') ENQUEUE_TIME,
ENQUEUE_MESSAGE_NUMBER
FROM V$STREAMS_CAPTURE;

And you can see when the apply process started working on them:
SELECT APPLY_NAME,
(DEQUEUE_TIME-DEQUEUED_MESSAGE_CREATE_TIME)*86400 LATENCY,
TO_CHAR(DEQUEUED_MESSAGE_CREATE_TIME,'HH24:MI:SS MM/DD/YY') CREATION,
TO_CHAR(DEQUEUE_TIME,'HH24:MI:SS MM/DD/YY') LAST_DEQUEUE,
DEQUEUED_MESSAGE_NUMBER
FROM V$STREAMS_APPLY_READER;

Another nice thing to try is to generate errors. For example, you can delete a table from the destination, make a change to it in the source, and see how it doesn’t replicate. The error will be found here:
SELECT APPLY_NAME,
SOURCE_DATABASE,
LOCAL_TRANSACTION_ID,
ERROR_NUMBER,
ERROR_MESSAGE,
MESSAGE_COUNT
FROM DBA_APPLY_ERROR;

And you should also check which archived logs can be deleted:
SELECT r.CONSUMER_NAME,
r.NAME,
r.FIRST_SCN,
r.NEXT_SCN,
r.PURGEABLE
FROM DBA_REGISTERED_ARCHIVED_LOG r, DBA_CAPTURE c
WHERE r.CONSUMER_NAME = c.CAPTURE_NAME;

Excluding Tables:
The last thing I did was filter out the tables I did not need to replicate. Those are tables the application uses as queues or temps – tons of inserts and deletes, but the data is useless for reports.
I used the DBMS_STREAMS_ADM.ADD_TABLE_RULES procedure. I selected to do the filtering during the capture (you can also do it during apply).
The important bits: table_name MUST contain the schema name in it. Otherwise the procedure will assume that the table belongs to strmadmin, and nothing will get filtered.
inclusion_rule=>FALSE is the part that indicates that we don’t want this table.

begin
DBMS_STREAMS_ADM.ADD_TABLE_RULES(
table_name=>'DEV110.APP_QUEUE',
streams_type=>'capture',
streams_name=>'CAPTURE_110DEV',
queue_name=>'REP_CAPTURE_QUEUE',
include_dml=>TRUE,
include_ddl=>TRUE,
include_tagged_lcr=> FALSE,
source_database=>'DB05',
inclusion_rule=>FALSE,
and_condition=>NULL);
end;

You can then see the rule you created by running select * from DBA_RULES

What now: The process I just described got me safely past a POC. There is obviously lot more to be done before this solution goes production. The performance of the capturing, and what is the impact of lags is of concern. Also, the entire system needs to be recreated from scratch whenever we do “alter database open reset logs” on the source DB (hopefully not often). But the immediate next step for me is to prepare a nice presentation to management showing what a great solution we prepared and how useful it will be for the business and how much this will be worth the investment.

 

Oracle on NFS and TCP Throttling February 16, 2008

Filed under: Storage, hardcore, musing, nerdism, network, nfs, performance — prodlife @ 2:48 am

In an old post about the futility of best practices, I mentioned a strange best practice that is used in our organization: “mounting a single NFS volume to multiple mount points and configuring the DB to use them as though they were separate volumes (i.e. put different data files on different mount points).”

I was 100% sure that there was absolutely no reason for this practice. I thought it was there because someone misunderstood OFA, and no one changed it because they worried about imaginary performance implications. Reader comments on this blog farther convinced me of this.

This week I had lunch with a former colleague. He worked as a DBA in the same organization for at least 10 years before I joined, and he is a real expert about storage (A rare gift – DBA who knows storage). I had to ask him if this strange best practice was in effect when he was working here, and what did he think of it. As a response he burst out laughing. I thought it was because he also found this practice ridiculous, but it turned out (once he stopped laughing and was able to talk again) that he was the one who invented this best practice. He had excellent reasons for doing it. It is not his fault that the practice was kept long after the reasons were no longer relevant.

So, why would you want to mount the same volume on different mount points?

If you use NFS on top of TCP (I think no one does NFS on top of UDP anymore), and you have a heavy throughput system (like a data warehouse), you risk reaching the point that the ACKs from the Netapp are not arriving fast enough, and Linux will apply throttling on your connection.

The reason for this behavior lies in the TCP Congestion Control. The Congestion Control was introduced in eighties to prevent the internet from choking on noisy lines, and it is built around a dynamic value called TCP Congestion Window. TCP Congestion Window is the amount of data a server will agree to send without receiving any ACKs. If this amount of data was sent and no ACK arrived yet, the connection will patiently wait until the ACKs arrive. There are good reasons for this behavior:  First, we don’t want the  OS to risk run out of memory for keeping all those packets. But even more important is that it is good network behavior, maybe there is a bottleneck on the way, and the packets really never reach the target, if you continue to send them, the bottleneck will only get worse.

However, Linux defaults are really too conservative for database systems, which is why Oracle’s installation instructions include changes to these defaults:

net.core.rmem_default=262144
net.core.wmem_default=262144
net.core.rmem_max=1048576
net.core.wmem_max=1048576

These parameters control the send and receive buffer sizes. In this post, I’m talking about the send buffer (wmem) which is used to govern the TCP Congestion Window. The receive buffer (rmem) is related how much data the machine will accept when acting as a client and is out of scope here. Oracle’s recommendations for the buffer sizes are a nice start, but you can change these values to match the throughput your system needs and your hardware supports.

So, now days improving throughput by playing with window sizes is all fun and games. But according to the old and wise DBA, back in the days of Linux 2.2, you did not want to change them. You had to work around the issue in a different way.

By mounting your single NFS volume on different mount points you could trick Linux into creating a separate TCP buffer for each separate connection (mount point), and now you have as many times the size of the original window as you want. As long as the physical line is keeping up, you are fine.

Great solution. Cheap, simple, scalable, maintainable, stable, everything you want.

Except that this same practice is used 10 years later, on a low-throughput OLTP systems that are not even close to saturating Oracle’s default window sizes and when there is an easier way to adjust the window anyway. Because no one knew why this was done.

Aside from the cool network details (I love cool network details, if you enjoy them as well, you probably want to read Tanel Poder’s blog), I think this illustrates the story behind many best practices – Once upon the time, it was a great solution to a problem. Then the problem was forgotten, the technology changed but the solution stayed.

 

Sources about Streams February 13, 2008

Filed under: books, streams, tips — prodlife @ 2:32 am

Streams is one of the least documented Oracle features. I’m used to start my research into a new area by reading few examples of how others used this feature. In the case of streams, very few examples were found and I had to work directly from Oracle documentation (the horrors!).

Here are the sources I’ve used while studying streams:

Lewis C. has the only useful example of streams I was able to find. In two parts. It was a good start but very soon I learned that streams is so customizable, that my use case is going to be very different than his.

Oracle’s Stream Replication Administrator Guide – Also known as “Everything you ever wanted to know about streams replication”. I found myself spending more time with this document than I did with my husband. At this point in the project I can cite entire sections verbatim. This is the streams replication Bible.

Oracle’s Streams Concepts and Administration Guide – Thats the second most important document. I started my project by reading the concepts part of it, and then referred to the administration part during the implementation stage. The chapter about monitoring was especially useful.

PL/SQL Packages Reference, DBMS_STREAMS_ADM – Documents the actual functions I’ve used. Of course, you can’t live without it – because you’ll always want to do things a bit differently from the examples in the administration guide.

If I were doing this project on 11g, I’d probably also try reading Oracle Database 2 Day + Data Replication and Integration Guide. But I only found out about this one today.

I’ll publish my own streams example in a day or two, so there will be at least one more example to work with.

 

Links of the Week – #4 February 1, 2008

Filed under: links — prodlife @ 6:51 pm

Friday morning is not complete without going over the latest Log Buffer and choosing few good articles to read over the weekend. Problem is that the last two Log Buffers are too MySQL oriented for me, and missing terrific posts from Oracle blogs.

As a service to my readers, here are the missing posts:

Rob Van-Wijk demonstrates the non-intuitive (but well documented) behavior of date format element ‘ww’.

Don Seiler found a bug with v$sql_bind_capture.

Tom Kyte doesn’t like a common exception handling technique.

Michael Dinh shows how to drop database (don’t try this on production!)

Christian Bilien has an excellent post about throughput.

Jared Still detects numbers with Translate().

Not exactly Oracle, but Steven Feuerstein wrote a fascinating post about Singapore.

Enjoy the weekend!