Unexpected Side Effect of Table Shrinking

Last year, I wrote a bit about table shrinking.  It generated quite a bit of discussion and both Freek d’Hooge and Venz warned about some of the side effects involved, especially when using “enable row movement”.

I just discovered another effect of row movement. As the name implies, after enabling row movement, rows get moved around. This can impact the clustering factor of your primary key.

If you had an index on a monotonously growing column (such as “create date”), the table and the index would be ordered in the same way. This results in a low clustering factor, and it means that the index is considered more favorably by the optimizer for some operations. I’m not a big expert on the optimizer, but I assume that if the index and table are ordered in the same way, and you need to scan a portion of the table, the index can be used to find the first row and then the table itself can be scanned. This method should be about twice faster than scanning the index and going to the table for each row to get extra columns that are not in the index. (Again, this is common sense only, I did not test or even deeply studied this).

Anyway, so you have this nicely ordered table, and a nicely ordered index to match, and you probably enjoy all kinds of performance benefits from this arrangement.

Now you enable row movement and shrink. Rows get moved around, and the order of rows in the table no longer matches the order of rows in the index. So you have a smaller table (less blocks) and a higher clustering factor. Maybe scanning portions of the table will be faster since the table is smaller and you read fewer blocks, but maybe it’ll take longer because the ordered changed and the index is less usefull now. In any case, this is something you should consider before deciding to shrink the table with row movement – it can have unexpected effect on your plans.

Here’s a small test I ran to verify:


SQL> create table test (id number);

Table created.

SQL> INSERT INTO test SELECT rownum FROM dual CONNECT BY LEVEL <= 10000;

10000 rows created.

SQL> commit;

Commit complete.

SQL> create index test_i on test (id);

Index created.

SQL> EXEC dbms_stats.gather_table_stats(ownname=>'SCOTT', tabname=>'TEST', estimate_percent=> null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> select t.table_name,i.index_name,t.blocks,t.num_rows,i.clustering_factor from user_tables t, user_indexes i where t.table_name=i.table_name and i.index_name='TEST_I';

TABLE_NAME                     INDEX_NAME                         BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------------------------ ------------------------------ ---------- ---------- -----------------
TEST                           TEST_I                                 20      10000                16

SQL> delete from test where mod(id,2)=0;

5000 rows deleted.

SQL> commit;

Commit complete.

SQL> EXEC dbms_stats.gather_table_stats(ownname=>'SCOTT', tabname=>'TEST', estimate_percent=> null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> select t.table_name,i.index_name,t.blocks,t.num_rows,i.clustering_factor from user_tables t, user_indexes i where t.table_name=i.table_name and i.index_name='TEST_I';

TABLE_NAME                     INDEX_NAME                         BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------------------------ ------------------------------ ---------- ---------- -----------------
TEST                           TEST_I                                 20       5000                16

SQL> alter table test enable row movement ;

Table altered.

SQL> alter table test shrink space cascade;

Table altered.

SQL> alter table test disable row movement ;

Table altered.

SQL> EXEC dbms_stats.gather_table_stats(ownname=>'SCOTT', tabname=>'TEST', estimate_percent=> null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

TABLE_NAME                     INDEX_NAME                         BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------------------------ ------------------------------ ---------- ---------- -----------------
TEST                           TEST_I                                  8       5000                22


Would You Rather Maintain SQL or PL/SQL?

Rob Van Wijk wrote about the reputation that SQL has as unmaintainable. He claims that SQL is not inherently less maintainable, but rather that developers who say that SQL is difficult to maintain are simply not as skilled in SQL as they are in PL/SQL, and they certainly don’t document their SQL to the same level they document their PL/SQL.

I mostly agree with Rob. It is not a coincidence that developers are more comfortable with PL/SQL than with SQL. PL/SQL is a procedural language. Nearly everyone learned programming with a procedural language, and probably programmed in a procedural language for at least three years before being introduced to SQL.

SQL is a different beast. It is a data maniplation language based (roughly) on the relational model, and most of its operators are based in relational algebra. Quite different from Pascal, C or Java. Its not really surprising that most programmers are never entirely comfortable around SQL and keep being surprised by its behavior.

I think there are few other reasons that SQL is percieved as less maintainable though:

  1. Readability of SQL is heavily dependent on the underlying data model. A good model will lead to more readable and maintainable SQL. When the data model does not match the reports that need to be generated, PL/SQL may have an advantage, its readability being less impacted by the data model.
  2. Differences in how SQL statements are written can lead to differences in performance. It is somewhat true in PL/SQL as well, but I think SQL is more sensitive. Performance often takes priority over readable code. Not using subquery factoring because it results in sub-optimal optimizer decision is one consequence. Of course, the reasons for writing SQL in a specific way are very rarely documented, making the code even less maintainable.
  3. As a result of #2 – developers who will gladly refactor PL/SQL code to make it more readable, will think at least twice before refactoring SQL. It is nearly trivial to predict how a given code change will impact PL/SQL performance, but nearly impossible to do the same for SQL (unless you are Jonathan Lewis).
  4. Debugging is a big issue with SQL. In PL/SQL it is easy to add few put_line messages to track what exactly your code is doing and why. There are even nice commercial debuggers. Finding bugs in SQL is still a bit of a black art that must be done without any tools or methods.

I’m not sure if there are any guidelines on when to prefer SQL and when PL/SQL is better, but at least the discussion is a bit more complete.


Researching Resource Manager

This week I’ve been looking into Oracle’s resource manager. It is a bit like “nice” for Oracle, only smarter and more complicated.

The idea is that you can create “consumer groups”, which is subsets of sessions divided by different criteria, and then decide how much CPU each group will get

There are relatively few resources around, so for mine and your convenience, I’ll list them here:

Oracle’s Database Administrator Guide is the most complete guide to resource manager. Everything is there.

PL/SQL Reference documents the DBMS_RESOURCE_MANAGER package. Can’t live without it, but they decided not to include any usage examples in the guide which makes it slightly less useful than I’d like, and forced me to go back and forth between the PL/SQL reference and the admin guide like crazy.

Jim Czuprynski at DBAsupport has a 3 part article about resource manager. Its only 9i, but it has very detailed examples and explanations. Part 3 contains test scripts. I found it very useful.


Fun with UNDO tablespaces

Its been a long time since I last wrote about how things went very wrong when doing something very routine. Thankfully, this time it was on the test system.

It all started when our storage manager complained that our test system is taking too much space. Since it is important to keep a good relation with the storage manager, I agreed to take a look.
cd \u05\oradata\STG ls -lSh
and then, in a different window:
select * from dba_data_files

Turned out that we have two undo tablespaces, taking up over 50G together. Obviously, one of them is not in use and can be dropped. Easy win.
Started by checking which one is in use:
SQL> show parameter undo

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
undo_management string AUTO
undo_retention integer 900
undo_tablespace string UNDOTBS1

And try to drop the other:
SQL> drop tablespace UNDOTBS2 including contents and datafiles;
drop tablespace UNDOTBS2 including contents and datafiles
*
ERROR at line 1:
ORA-30013: undo tablespace 'UNDOTBS2' is currently in use

Strange, right? But no problem, I’ll connect to the DB machine, shutdown the test system, startup in restricted mode and then drop it. (Ignoring strange things is the first step on the road to hell!)


ssh dbstg02
sqlplus "/ as sysdba"
....
shutdown immediate
....
startup restrict
....
SQL> show parameter undo

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
undo_management string AUTO
undo_retention integer 900
undo_tablespace string UNDOTBS2

Looks like now we are using the other UNDO? strange, but we won’t let that stop us, right? (mistake #2).

ALTER SYSTEM SET undo_tablespace='UNDOTBS1';
drop tablespace UNDOTBS2 including contents and datafiles;

Deleted file /u05/oradata/STG/undotbs02.dbf
Completed: drop tablespace UNDOTBS2 including contents and datafiles
shutdown immediate
startup

ORA-30012: undo tablespace 'UNDOTBS2' does not exist or of wrong type
ORA-1092 signalled during: ALTER DATABASE OPEN...

That was unexpected, but maybe for some reason the spfile was not written? lets startup nomount and switch to the right undo (Ignoring the 3rd sign that something went wrong, this time too late):

SQL> startup nomount
ORACLE instance started.
...
SQL> show parameter undo

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
undo_management string AUTO
undo_retention integer 900
undo_tablespace string UNDOTBS2
SQL> alter system set undo_tablespace=UNDOTBS1;
alter system set undo_tablespace=UNDOTBS1
*
ERROR at line 1:
ORA-02097: parameter cannot be modified because specified value is invalid
ORA-30012: undo tablespace 'UNDOTBS1' does not exist or of wrong type

SQL> select * from v$tablespace;

TS# NAME INC BIG FLA ENC
---------- ------------------------------ --- --- --- ---
0 SYSTEM YES NO YES
1 UNDOTBS1 YES NO YES

What? What do you mean “does not exist”? When in trouble or in doubt, use pfiles.

SQL> create pfile='/tmp/chen1.ini' from spfile;
File created.

And in the pfile I see…

STG1.undo_tablespace='UNDOTBS1'
STG2.undo_tablespace='UNDOTBS2'
*.undo_tablespace='UNDOTBS1'

Oops! I completely forgot that I’m maintaining a RAC system, and that “Oracle RAC automatically manages undo segments within a specific undo tablespace that is assigned to an instance. ” Each RAC node should have its own undo tablespace!

Fixing the mess:
I edited the pfile to show only the one undo tablespace I have and started one node.
Once I had a running database, I could recreate the missing tablespace:
create undo tablespace UNDOTBS2 datafile '/u05/oradata/STG/undotbs2_01.dbf' size 2G;
And assign each instance its own tablespace once again:
alter system set undo_tablespace=UNDOTBS1 SID='STG1';
alter system set undo_tablespace=UNDOTBS2 SID='STG2';

Of course, messing up a system like that should only be done by trained professionals. Kids, don’t try this at home!

*****
Messing up my own system is not nearly fun enough. Immediately after OOW2007, I started a habit of telling people how to mess up their system through the OTN forum. I’m happy to say that I messed up enough of these to get that nice “active member” gold icon next to my name 🙂


DBA on a deserted island

Sometimes I like asking myself – If I had to manage large databases at a deserted island, what tools would I take with me? What do I really use day in and day out and would not like to admin my DB without?

Lets start building the system from ground up:

Netapp storage will be a must. I’m addicted to many of Netapp capabilities, such as snapshots and flexible cloning, and it would be difficult to live without them. I also prefer my storage on NFS.

The servers would be Intel, 64 bit, lots of memory, several fast network cards. I don’t care about much more than that.

Linux, of course. I’m used to RH, but will work with anything. Bash, perl, gvim, sar, vmstat, top, ps, ssh and all my scripts and aliases should be included. Don’t forget to install rlwrap.

The DB? Oracle of course. 10.2.0.3 works great, but 11 may be more fun. Can I take both? Throw in few patches.  Throw in the relevant Oracle documentation and books, can I also download the PSOUG library?

It is not absolutely mandatory, but performance tuning goes much faster with proper tools. Either Confio’s or Oracle’s will do.

Thats it. I’m ready to rock 🙂

Needless to say, I did not get payed by any vendor for mentioning them here. On the contrary, my company pays them quite a bit and they deliver the goods.


Anecdote about Rebuilding Indexes

A year ago I was handling a rather confusing performance issue. It was the usual problem – a query that took forever to run. While checking the issue, I discovered that one of the main tables that this query was querying had amazing size variance at different times it held anything between 3 rows to 3 million rows. This is obviously a design mistake that can indeed cause performance problems. An experienced DBA might have configured a job to collect statistics very frequently, or maybe something about bind variable peeking, or just go to the developers and demand an immediate fix.

I’m not an experienced DBA. I noticed that the indexes were highly fragmented with lots of deleted leaf nodes, and so I decided to rebuild the indexes. Online rebuild, so users will not be affected and parallel – so it will go faster.

It was a disaster. First of all, rebuilding online does affect performance. A lot. Users did notice. The table was effectively locked until we killed the rebuild. Its been a year and Oracle support is still silent about whether this is an expected behavior of online rebuild. Also, since the index was rebuilt with parallel option, many of the queries started running in parallel, which also had horrible effects and we had to fix this issue too.

After we cleaned up the entire mess, I got a stern lecture from Senior DBA: “You can’t solve performance problems by randomly rebuilding indexes. You must throughly understand the problem first”.

Fast forward to this weekend. Customer complains about very slow queries. Quick check on v$sql shows that each execution retrieves about 100 rows, but does about 10 million buffer gets. Ouch. The explain plan looked reasonable and I couldn’t see any cause for the excessive gets. I decide to wait with this to Monday, run a trace and see what is really happening there. I never got to run that trace. Senior DBA got home later that day, saw the emails about that query and immediately decides to rebuild indexes!

Needless to say, rebuilding indexes works great when you really understand the problem. It took 2 hours, caused no additional performance issues, and after it was done, the query required only 1000 buffer gets.

I still have some suspicions that the rebuild didn’t change much and what really caused the difference was the fact that Senior DBA also collected statistics on the entire table and all its indexes immediately after the rebuild, but we can’t really know, and as far as I saw – the plan did change.

Regardless of the exact facts, this reminded me of an AI Koan:

A novice was trying to fix a broken Lisp machine by turning the power off and on. Knight, seeing what the student was doing spoke sternly: “You can not fix a machine by just power-cycling it with no understanding of what is going wrong.” Knight turned the machine off and on. The machine worked.

(Knight is the inventor of the Lisp machine).

You can read more AI Koans here. Highly recommended.


Interesting observation about “drop user cascade”

Started two “drop user cascade” one from sqldeveloper on my machine and the other by ssh to the server and running local sqlplus (normal run, didn’t use nohup or anything like this). Each of the users had about 70,000 objects, so few hours later when I had to go home, I closed my sqldeveloper (reasoning that it can’t possibly cause a rollback), left the ssh session open, disconnected my laptop from the network, and went home.

The next morning – the user I was deleting from sqldeveloper had the same number of objects as it did when I closed the client last night. But the user I deleted through ssh and sqlplus was completely deleted. Obviously the server process didn’t notice that the client went away and continued to run.