Unexpected Side Effect of Table Shrinking

Last year, I wrote a bit about table shrinking.  It generated quite a bit of discussion and both Freek d’Hooge and Venz warned about some of the side effects involved, especially when using “enable row movement”.

I just discovered another effect of row movement. As the name implies, after enabling row movement, rows get moved around. This can impact the clustering factor of your primary key.

If you had an index on a monotonously growing column (such as “create date”), the table and the index would be ordered in the same way. This results in a low clustering factor, and it means that the index is considered more favorably by the optimizer for some operations. I’m not a big expert on the optimizer, but I assume that if the index and table are ordered in the same way, and you need to scan a portion of the table, the index can be used to find the first row and then the table itself can be scanned. This method should be about twice faster than scanning the index and going to the table for each row to get extra columns that are not in the index. (Again, this is common sense only, I did not test or even deeply studied this).

Anyway, so you have this nicely ordered table, and a nicely ordered index to match, and you probably enjoy all kinds of performance benefits from this arrangement.

Now you enable row movement and shrink. Rows get moved around, and the order of rows in the table no longer matches the order of rows in the index. So you have a smaller table (less blocks) and a higher clustering factor. Maybe scanning portions of the table will be faster since the table is smaller and you read fewer blocks, but maybe it’ll take longer because the ordered changed and the index is less usefull now. In any case, this is something you should consider before deciding to shrink the table with row movement – it can have unexpected effect on your plans.

Here’s a small test I ran to verify:


SQL> create table test (id number);

Table created.

SQL> INSERT INTO test SELECT rownum FROM dual CONNECT BY LEVEL <= 10000;

10000 rows created.

SQL> commit;

Commit complete.

SQL> create index test_i on test (id);

Index created.

SQL> EXEC dbms_stats.gather_table_stats(ownname=>'SCOTT', tabname=>'TEST', estimate_percent=> null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> select t.table_name,i.index_name,t.blocks,t.num_rows,i.clustering_factor from user_tables t, user_indexes i where t.table_name=i.table_name and i.index_name='TEST_I';

TABLE_NAME                     INDEX_NAME                         BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------------------------ ------------------------------ ---------- ---------- -----------------
TEST                           TEST_I                                 20      10000                16

SQL> delete from test where mod(id,2)=0;

5000 rows deleted.

SQL> commit;

Commit complete.

SQL> EXEC dbms_stats.gather_table_stats(ownname=>'SCOTT', tabname=>'TEST', estimate_percent=> null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> select t.table_name,i.index_name,t.blocks,t.num_rows,i.clustering_factor from user_tables t, user_indexes i where t.table_name=i.table_name and i.index_name='TEST_I';

TABLE_NAME                     INDEX_NAME                         BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------------------------ ------------------------------ ---------- ---------- -----------------
TEST                           TEST_I                                 20       5000                16

SQL> alter table test enable row movement ;

Table altered.

SQL> alter table test shrink space cascade;

Table altered.

SQL> alter table test disable row movement ;

Table altered.

SQL> EXEC dbms_stats.gather_table_stats(ownname=>'SCOTT', tabname=>'TEST', estimate_percent=> null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

TABLE_NAME                     INDEX_NAME                         BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------------------------ ------------------------------ ---------- ---------- -----------------
TEST                           TEST_I                                  8       5000                22


Would You Rather Maintain SQL or PL/SQL?

Rob Van Wijk wrote about the reputation that SQL has as unmaintainable. He claims that SQL is not inherently less maintainable, but rather that developers who say that SQL is difficult to maintain are simply not as skilled in SQL as they are in PL/SQL, and they certainly don’t document their SQL to the same level they document their PL/SQL.

I mostly agree with Rob. It is not a coincidence that developers are more comfortable with PL/SQL than with SQL. PL/SQL is a procedural language. Nearly everyone learned programming with a procedural language, and probably programmed in a procedural language for at least three years before being introduced to SQL.

SQL is a different beast. It is a data maniplation language based (roughly) on the relational model, and most of its operators are based in relational algebra. Quite different from Pascal, C or Java. Its not really surprising that most programmers are never entirely comfortable around SQL and keep being surprised by its behavior.

I think there are few other reasons that SQL is percieved as less maintainable though:

  1. Readability of SQL is heavily dependent on the underlying data model. A good model will lead to more readable and maintainable SQL. When the data model does not match the reports that need to be generated, PL/SQL may have an advantage, its readability being less impacted by the data model.
  2. Differences in how SQL statements are written can lead to differences in performance. It is somewhat true in PL/SQL as well, but I think SQL is more sensitive. Performance often takes priority over readable code. Not using subquery factoring because it results in sub-optimal optimizer decision is one consequence. Of course, the reasons for writing SQL in a specific way are very rarely documented, making the code even less maintainable.
  3. As a result of #2 – developers who will gladly refactor PL/SQL code to make it more readable, will think at least twice before refactoring SQL. It is nearly trivial to predict how a given code change will impact PL/SQL performance, but nearly impossible to do the same for SQL (unless you are Jonathan Lewis).
  4. Debugging is a big issue with SQL. In PL/SQL it is easy to add few put_line messages to track what exactly your code is doing and why. There are even nice commercial debuggers. Finding bugs in SQL is still a bit of a black art that must be done without any tools or methods.

I’m not sure if there are any guidelines on when to prefer SQL and when PL/SQL is better, but at least the discussion is a bit more complete.


Researching Resource Manager

This week I’ve been looking into Oracle’s resource manager. It is a bit like “nice” for Oracle, only smarter and more complicated.

The idea is that you can create “consumer groups”, which is subsets of sessions divided by different criteria, and then decide how much CPU each group will get

There are relatively few resources around, so for mine and your convenience, I’ll list them here:

Oracle’s Database Administrator Guide is the most complete guide to resource manager. Everything is there.

PL/SQL Reference documents the DBMS_RESOURCE_MANAGER package. Can’t live without it, but they decided not to include any usage examples in the guide which makes it slightly less useful than I’d like, and forced me to go back and forth between the PL/SQL reference and the admin guide like crazy.

Jim Czuprynski at DBAsupport has a 3 part article about resource manager. Its only 9i, but it has very detailed examples and explanations. Part 3 contains test scripts. I found it very useful.


Fun with UNDO tablespaces

Its been a long time since I last wrote about how things went very wrong when doing something very routine. Thankfully, this time it was on the test system.

It all started when our storage manager complained that our test system is taking too much space. Since it is important to keep a good relation with the storage manager, I agreed to take a look.
cd \u05\oradata\STG ls -lSh
and then, in a different window:
select * from dba_data_files

Turned out that we have two undo tablespaces, taking up over 50G together. Obviously, one of them is not in use and can be dropped. Easy win.
Started by checking which one is in use:
SQL> show parameter undo

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
undo_management string AUTO
undo_retention integer 900
undo_tablespace string UNDOTBS1

And try to drop the other:
SQL> drop tablespace UNDOTBS2 including contents and datafiles;
drop tablespace UNDOTBS2 including contents and datafiles
*
ERROR at line 1:
ORA-30013: undo tablespace 'UNDOTBS2' is currently in use

Strange, right? But no problem, I’ll connect to the DB machine, shutdown the test system, startup in restricted mode and then drop it. (Ignoring strange things is the first step on the road to hell!)


ssh dbstg02
sqlplus "/ as sysdba"
....
shutdown immediate
....
startup restrict
....
SQL> show parameter undo

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
undo_management string AUTO
undo_retention integer 900
undo_tablespace string UNDOTBS2

Looks like now we are using the other UNDO? strange, but we won’t let that stop us, right? (mistake #2).

ALTER SYSTEM SET undo_tablespace='UNDOTBS1';
drop tablespace UNDOTBS2 including contents and datafiles;

Deleted file /u05/oradata/STG/undotbs02.dbf
Completed: drop tablespace UNDOTBS2 including contents and datafiles
shutdown immediate
startup

ORA-30012: undo tablespace 'UNDOTBS2' does not exist or of wrong type
ORA-1092 signalled during: ALTER DATABASE OPEN...

That was unexpected, but maybe for some reason the spfile was not written? lets startup nomount and switch to the right undo (Ignoring the 3rd sign that something went wrong, this time too late):

SQL> startup nomount
ORACLE instance started.
...
SQL> show parameter undo

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
undo_management string AUTO
undo_retention integer 900
undo_tablespace string UNDOTBS2
SQL> alter system set undo_tablespace=UNDOTBS1;
alter system set undo_tablespace=UNDOTBS1
*
ERROR at line 1:
ORA-02097: parameter cannot be modified because specified value is invalid
ORA-30012: undo tablespace 'UNDOTBS1' does not exist or of wrong type

SQL> select * from v$tablespace;

TS# NAME INC BIG FLA ENC
---------- ------------------------------ --- --- --- ---
0 SYSTEM YES NO YES
1 UNDOTBS1 YES NO YES

What? What do you mean “does not exist”? When in trouble or in doubt, use pfiles.

SQL> create pfile='/tmp/chen1.ini' from spfile;
File created.

And in the pfile I see…

STG1.undo_tablespace='UNDOTBS1'
STG2.undo_tablespace='UNDOTBS2'
*.undo_tablespace='UNDOTBS1'

Oops! I completely forgot that I’m maintaining a RAC system, and that “Oracle RAC automatically manages undo segments within a specific undo tablespace that is assigned to an instance. ” Each RAC node should have its own undo tablespace!

Fixing the mess:
I edited the pfile to show only the one undo tablespace I have and started one node.
Once I had a running database, I could recreate the missing tablespace:
create undo tablespace UNDOTBS2 datafile '/u05/oradata/STG/undotbs2_01.dbf' size 2G;
And assign each instance its own tablespace once again:
alter system set undo_tablespace=UNDOTBS1 SID='STG1';
alter system set undo_tablespace=UNDOTBS2 SID='STG2';

Of course, messing up a system like that should only be done by trained professionals. Kids, don’t try this at home!

*****
Messing up my own system is not nearly fun enough. Immediately after OOW2007, I started a habit of telling people how to mess up their system through the OTN forum. I’m happy to say that I messed up enough of these to get that nice “active member” gold icon next to my name 🙂


DBA on a deserted island

Sometimes I like asking myself – If I had to manage large databases at a deserted island, what tools would I take with me? What do I really use day in and day out and would not like to admin my DB without?

Lets start building the system from ground up:

Netapp storage will be a must. I’m addicted to many of Netapp capabilities, such as snapshots and flexible cloning, and it would be difficult to live without them. I also prefer my storage on NFS.

The servers would be Intel, 64 bit, lots of memory, several fast network cards. I don’t care about much more than that.

Linux, of course. I’m used to RH, but will work with anything. Bash, perl, gvim, sar, vmstat, top, ps, ssh and all my scripts and aliases should be included. Don’t forget to install rlwrap.

The DB? Oracle of course. 10.2.0.3 works great, but 11 may be more fun. Can I take both? Throw in few patches.  Throw in the relevant Oracle documentation and books, can I also download the PSOUG library?

It is not absolutely mandatory, but performance tuning goes much faster with proper tools. Either Confio’s or Oracle’s will do.

Thats it. I’m ready to rock 🙂

Needless to say, I did not get payed by any vendor for mentioning them here. On the contrary, my company pays them quite a bit and they deliver the goods.


Anecdote about Rebuilding Indexes

A year ago I was handling a rather confusing performance issue. It was the usual problem – a query that took forever to run. While checking the issue, I discovered that one of the main tables that this query was querying had amazing size variance at different times it held anything between 3 rows to 3 million rows. This is obviously a design mistake that can indeed cause performance problems. An experienced DBA might have configured a job to collect statistics very frequently, or maybe something about bind variable peeking, or just go to the developers and demand an immediate fix.

I’m not an experienced DBA. I noticed that the indexes were highly fragmented with lots of deleted leaf nodes, and so I decided to rebuild the indexes. Online rebuild, so users will not be affected and parallel – so it will go faster.

It was a disaster. First of all, rebuilding online does affect performance. A lot. Users did notice. The table was effectively locked until we killed the rebuild. Its been a year and Oracle support is still silent about whether this is an expected behavior of online rebuild. Also, since the index was rebuilt with parallel option, many of the queries started running in parallel, which also had horrible effects and we had to fix this issue too.

After we cleaned up the entire mess, I got a stern lecture from Senior DBA: “You can’t solve performance problems by randomly rebuilding indexes. You must throughly understand the problem first”.

Fast forward to this weekend. Customer complains about very slow queries. Quick check on v$sql shows that each execution retrieves about 100 rows, but does about 10 million buffer gets. Ouch. The explain plan looked reasonable and I couldn’t see any cause for the excessive gets. I decide to wait with this to Monday, run a trace and see what is really happening there. I never got to run that trace. Senior DBA got home later that day, saw the emails about that query and immediately decides to rebuild indexes!

Needless to say, rebuilding indexes works great when you really understand the problem. It took 2 hours, caused no additional performance issues, and after it was done, the query required only 1000 buffer gets.

I still have some suspicions that the rebuild didn’t change much and what really caused the difference was the fact that Senior DBA also collected statistics on the entire table and all its indexes immediately after the rebuild, but we can’t really know, and as far as I saw – the plan did change.

Regardless of the exact facts, this reminded me of an AI Koan:

A novice was trying to fix a broken Lisp machine by turning the power off and on. Knight, seeing what the student was doing spoke sternly: “You can not fix a machine by just power-cycling it with no understanding of what is going wrong.” Knight turned the machine off and on. The machine worked.

(Knight is the inventor of the Lisp machine).

You can read more AI Koans here. Highly recommended.


Interesting observation about “drop user cascade”

Started two “drop user cascade” one from sqldeveloper on my machine and the other by ssh to the server and running local sqlplus (normal run, didn’t use nohup or anything like this). Each of the users had about 70,000 objects, so few hours later when I had to go home, I closed my sqldeveloper (reasoning that it can’t possibly cause a rollback), left the ssh session open, disconnected my laptop from the network, and went home.

The next morning – the user I was deleting from sqldeveloper had the same number of objects as it did when I closed the client last night. But the user I deleted through ssh and sqlplus was completely deleted. Obviously the server process didn’t notice that the client went away and continued to run.


Maintaining Database for Vendor Applications

I made few new year resolutions about being a kinder, gentler person, so I’ll say this as nicely as possible:
Vendors sometimes make their DB related software very challenging to maintain.
I’ve spent the last three days trying to run some very routine maintenance on a schema that contains data for a specific vendor application. The original time estimate was 2 hours. I was about to publish the name of the software and the company and warn you to do you best to avoid it, but I ran into similar issues last month with different vendor software, and three month ago it was yet another one, so I won’t point fingers. If you have to maintain a database for vendor application, you are probably screwed.

Few challenges that are common to vendor applications: If you are a vendor – do your best to avoid these traps.

  1. No instructions on how to configure the application if you need to move the schema from one DB to another. This is a common task. Really.
  2. DB Maintenance instructions in the maintenance documentation that say “Disable cost based optimization” and “rebuild indexes frequently”. DBAs read this as “Clueless vendor”.
  3. No built in data purging policy or good way to clean old data from schema. We can’t let the application grow as much as it wants. Unfortunately, disk space is still expensive.
  4. Do a “select * from huge_table” on many trivial operations. Or more than once, on the same huge table, during the same trivial operation. You do realize that this makes the application slow?
  5. Scary bugs in DB maintenance utilities. I realize that bugs are hard to avoid, but when an aggregation utility fails to aggregate data, it should not delete the data it just failed to aggregate. Especially not by deleting tiny batches of the data and committing after each batch, making it impossible to use flashback for recovery and forcing a full fledged recovery process.
  6. Maintenance utilities that do very strange things. I started the vendor’s data purging utility this morning. 6 hours later it is still running (22% complete), and yet to run a single query. What the hell is it doing?
  7. Undocumented maintenance utilities – Maintenance instructions tell me to run “db reorganization tool” regularly. I have no clue what this will actually do on my DB, and as a responsible DBA I simply can’t run this on production.

As a general word of advice – Dev managers of DB intensive software would do well to read Chapter 10 of “Oracle Insights: Tales of the Oak Table”. Its called “Design Disasters” written by Jonathan Lewis. It contains lots of other traps to avoid that I did not mention here because Jonathan Lewis already explained them very well indeed.


Fast… Faster… Fastest DB Upgrades

Did you notice that maintenance windows are getting shorter?

Five years ago, when I still managed the apps team, DBAs could shut down the database for maintenance for an entire day. Three years ago four hours window every month were acceptable. Now we are down to two hours.

Upgrading a DB to the latest version was always time consuming. Just the installation can easily take an hour. Then there is the catalog upgrade, and recompiling. Apply few patches and your maintenance window is far over. And that is when everything went according to plan.

So, here is our latest take on fast upgrades:

  1. Install the new version of Oracle Software on a new server, with patches and all. Don’t create a DB.
  2. During maintenance window – stop the old DB.
  3. Use Netapp’s SnapMirror technology to copy all the DB files (data, redo, archive, control) to the new server. Make sure it “looks” exactly like it did in the old server – same mount names.
  4. Copy spfile to new server.
  5. startup upgrade
  6. run catalog upgrade
  7.  Restart DB
  8. Recompile
  9. Start Listener

We did all this in less than an hour, with catalog upgrade taking about 30 minutes.


More about Change Management

I’ve written in the past about change management processes, and if you’ve read my old post you know that I’m a huge fan of change management. However, I’ve noticed that while everyone loves change management in theory, most of us hate it in practice.

I suspect that the software we are using for the change management makes it such a pain to manage the changes that people are willing to take huge risks in order to avoid using it. We are currently using a home-brew system for our change management, but we will soon start using a commercial system that should be customized for our use. I’m trying to collect requirements on what the change management software should do. My main goal is to have a system that will not be a pain and will not hinder the change management process, anything better than that is a bonus.

  1. The system should have good response times. If a user already knows everything he wants to enter into the change request, filling the request should not take more than a minute or two. Users will start feeling physical pain when using the system if the change request form contains pop ups that take 30 seconds to display.
  2. It should be extremely easy to find changes you are looking for. For every change that someone creates, there will be 3-5 people that will want to read the change and will have to look for it. Make it easy for them and you just doubled the productivity in your team.
  3. The system should support whatever processes you have in place. Suppose that a change is considered major if it causes a downtime of over two hours, and major changes should be signed off by a customer representative. You want your system to recognize that a change is major and send email to the customer representative with a URL that allows him to approve. You don’t want your DBA to realize this a day before the change, look for the customer representative, ask her to send email with her approval and paste the approval into the “notes” section of the change.
  4. You want to be able to get useful reports out of the system. What percentage of the changes succeeded? was there a difference between RAC and non-RAC DBs? how many changes were rejected by customers? what were the reasons for rejections? how long does it take to approve a change? what’s the 90th percentile?
  5. You want to be able to do planning meetings and customer notifications from that system. What changes are going to be done next week? In two weeks? which systems will be affected? were customers notified?

I’m sure that by the time the new system will arrive the list will grow significantly.

*

Netapp are making decisions by consensus – and live to blog about it.
Steve, on the other hand, thinks that the storage business is dead. I hope Netapp will survive this, because they are such a fun vendor with fun company culture.

*

I registered to Oracle Open World 2007! I hope I’ll get to meet few of you there, it can be loads of fun.