I'm just a simple DBA on a complex production system

Writing about all things production. Especially Oracle databases.

Starting Oracle on Windows March 28, 2008

Filed under: tips — prodlife @ 6:20 am

I never thought that starting up Oracle is difficult. Usually it looks something like this:

sqlplus / as sysdba
startup

Except when Oracle is used on Windows. We have one or two dev servers installed on Windows, but in the three years I’ve been working as a DBA, I never had to restart one. Today, I connected to one of these machines, noticed it was down and attempted to restart it as usual:

C:\Documents and Settings\chen>set ORACLE_SID=ORCL
C:\Documents and Settings\chen>sqlplus / as sysdba

SQL*Plus: Release 11.1.0.6.0 - Production on Thu Mar 27 22:56:25 2008
Copyright (c) 1982, 2007, Oracle. All rights reserved.

ERROR:
ORA-12560: TNS:protocol adapter error
Enter user-name:

C:\Documents and Settings\chen>set | grep ORA
ORACLE_HOME=D:\app\chen\product\11.1.0\db_1
ORACLE_SID=ORCL


C:\Documents and Settings\chen>sqlplus

SQL*Plus: Release 11.1.0.6.0 - Production on Thu Mar 27 22:57:11 2008
Copyright (c) 1982, 2007, Oracle. All rights reserved.

Enter user-name: / as sysdba
ERROR:
ORA-12560: TNS:protocol adapter error
Enter user-name:

At that point I recalled that Windows has these Oracle “service” things. So I started the service (from UI! how humiliating!) and surprise:

C:\Documents and Settings\chen>sqlplus

SQL*Plus: Release 11.1.0.6.0 - Production on Thu Mar 27 23:03:26 2008
Copyright (c) 1982, 2007, Oracle. All rights reserved.

Enter user-name: / as sysdba

Connected to:
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

Seems that Windows will not let me connect to an idle instance to start it. Very strange.

 

Researching Resource Manager March 19, 2008

Filed under: maintenance, tips — prodlife @ 1:54 am

This week I’ve been looking into Oracle’s resource manager. It is a bit like “nice” for Oracle, only smarter and more complicated.

The idea is that you can create “consumer groups”, which is subsets of sessions divided by different criteria, and then decide how much CPU each group will get

There are relatively few resources around, so for mine and your convenience, I’ll list them here:

Oracle’s Database Administrator Guide is the most complete guide to resource manager. Everything is there.

PL/SQL Reference documents the DBMS_RESOURCE_MANAGER package. Can’t live without it, but they decided not to include any usage examples in the guide which makes it slightly less useful than I’d like, and forced me to go back and forth between the PL/SQL reference and the admin guide like crazy.

Jim Czuprynski at DBAsupport has a 3 part article about resource manager. Its only 9i, but it has very detailed examples and explanations. Part 3 contains test scripts. I found it very useful.

 

Story of Sysdate March 18, 2008

Filed under: musing, xfiles — prodlife @ 2:13 am

It started at 5pm, because all interesting production issues start at 5pm. As the on-call DBA, I recieved an alert about over used undo tablespace on one of our databases. First thing to do in such case is check v$transaction for transactions that have been running for a while. Turned out that v$transaction is empty.

The next suspect was the monitor itself. Maybe there is a bug in the query that checks undo tablespace usage? Checking the query, I noticed two interesting things:
1. It reported over 3G of used undo tablespace, while I know we only have 2G.
2. It contained the following lovely gem (completely unrelated to the post, but too cute not to mention): case when to_number(value) < 3600 then to_number(value)/60/1440 else to_number(value)/3600/24 end
Clearly the query needed some debugging, and I set to the task.

I usually start debugging by breaking the query into component subqueries and verifying that they give the expected results. This time, I started with the part that calculates the amount of undo bytes we need to keep due to our retention policy. The following query gave me all the undo history on the system. Not just the rows that were within our retention policy:

SELECT
to_char(begin_time,'dd-mm-yyyy hh24:mi:ss'),to_char(end_time,'dd-mm-yyyy hh24:mi:ss'),undoblks,txncount
FROM
V$UNDOSTAT US
WHERE
US.BEGIN_TIME >= SYSDATE - (
select
to_number(value)/(60*60*24)
from
v$parameter
where
name='undo_retention');

Very strange. The calculation looks right. Lets go all the way back to basics:
SQL> select to_char(SYSDATE,'dd-mm-yyyy hh24:mi:ss') from dual ;
TO_CHAR(SYSDATE,'DD
-------------------
02-03-2004 22:23:24

This is not an undo issue anymore. What happened to my sysdate? The documentation says that sysdate gets the date from the host OS. So I start there:
[oracle@vvsdb01 ~]$ date
Mon Mar 17 18:40:31 PDT 2008

OS looks fine. What about timestamp?
SQL> select sysdate from dual;
SYSDATE
--------------------
02 Mar 2004 22:23:24
SQL> select current_timestamp from dual;
CURRENT_TIMESTAMP
---------------------------------------------------------------------------
17-MAR-08 05.48.49.846746 PM -07:00

Timestamp is fine, but now senior DBA noticed that the sysdate time didn’t change at all since the last time we looked at sysdate. Surely more than few minutes passed? Quick search revealed a previously unknown system parameter – fixed_date. FIXED_DATE enables you to set a constant date that SYSDATE will always return instead of the current date.


SQL> show parameter FIX

NAME TYPE VALUE
———————————— ———– ——————————
fixed_date string 02-MAR-04
Yes. We had that. Checking the alert log to see when it happened:
Mon Mar 17 16:48:51 2008
ALTER SYSTEM SET fixed_date='2004-03-02 22:23:24' SCOPE=BOTH;

We caught it right on time! Lets fix it fast:
ALTER SYSTEM SET fixed_date=NONE SCOPE=BOTH;

And now how did it happen? We had the time it happened, so we checked /var/log/secure for matching ssh logins and got the IP.
The IP was of my own machine. I was completely flabbergasted for quite some time. But I think I reconstructed how it happened:
I was reading about some initialization parameters on this site, and I have a nasty habit of playing with the mouse while reading. I must have selected about half of that page. Then I remembered that I had other work to do and proceeded to copy-paste from vim window to sqlplus. Turned out that I didn’t really copy from vim, and when I pasted into sqlplus, I actually pasted half of that webpage. I saw that I got a bunch of errors because I just pasted a bunch of random sentences into sqlplus, so I cursed a bit and tried copying from vim again. What I did not notice was that among the other things I pasted by mistake was:

SQL> alter session set nls_date_format = 'dd.mon.yyyy hh24:mi:ss';
alter system set fixed_date='2004-03-02 22:23:24';
select sysdate from dual;
SYSDATE
-------------------
02.mar.2004 22:23:24
Session altered.

the problem was right there in my terminal screen all along!

 

Streams – Part II March 6, 2008

Filed under: scripts, streams, tips — prodlife @ 8:16 pm

Two weeks after the last streams post, and I need to give some updates and corrections.

1) I’ve written that:

“And you should also check which archived logs can be deleted:
SELECT r.CONSUMER_NAME,
r.NAME,
r.FIRST_SCN,
r.NEXT_SCN,
r.PURGEABLE
FROM DBA_REGISTERED_ARCHIVED_LOG r, DBA_CAPTURE c
WHERE r.CONSUMER_NAME = c.CAPTURE_NAME;"

Actually, that is not a good way to check for which archive logs can be deleted. According to Metalink note 373066.1 RMAN use dba_capture.required_checkpoint_scn to determine which logs can be purged. So I suggest to use this when doing managed backups as well:
SELECT r.CONSUMER_NAME,
r.NAME,
r.FIRST_SCN,
r.NEXT_SCN,
case when (r.next_scn > c.required_checkpoint_scn) then 'NO' else 'YES' end purgable
FROM DBA_REGISTERED_ARCHIVED_LOG r, DBA_CAPTURE c
WHERE r.CONSUMER_NAME = c.CAPTURE_NAME;

2) I’ve made some severe mistakes when trying to extend the streams implementation by replicating few more tables. I decided to start from scratch by running: dbms_streams_admin.remove_streams_configuration.

This does not work. The solution was found in Morgan’s Library:
-- the following provided by Michael M. Brennan from RSA Security who
-- reports that REMOVE_STREAMS_CONFIGURATION does not work unless the
-- propagation has been dropped first.
SELECT propagation_name FROM dba_propagation;
exec dbms_propagation_adm.drop_propagation('PROPAGATION$_143');
exec dbms_streams_adm.remove_streams_configuration;

3) Unfortunately, when I tried to start a new capture process, I find myself with the following unexpected problem:
SQL> select required_checkpoint_scn from dba_capture;

REQUIRED_CHECKPOINT_SCN
-----------------------
0

Not good. This means that my capture process is looking at the archive logs for scn 0. I have no control over required_checkpoint_scn and have no clue how this happened. Oracle support is still researching that.

But this means that I can refer you to two metalink articles that can help with debug (or at least with opening SRs on streams related issues):

Note 313279.1 “Capture Process Looping In Initializing/Dictionary Initialization State” shows how to start a trace against the capture process and how to interpret the result trace file.
Note 273674.1 “Streams Configuration Report and Health Check Script” contains cute scripts that summarize all the the streams configuration into a nice HTML page. I didn’t find the script very useful for my debugging, but it is absolutely mandatory when opening a streams related SR. They will not talk to you before you sent them the result of the script, and they’ll ask you to run it again whenever they run out of ideas on what to say next.