I'm just a simple DBA on a complex production system

Writing about all things production. Especially Oracle databases.

Cloning Oracle Home from RAC to Stand-Alone July 31, 2010

Filed under: Linux,Oracle,rants — prodlife @ 1:50 am

This post originally appeared over at Pythian. There are also some very smart comments over there that you shouldn’t miss, go take a look!

This should have been the easiest task on my todo list: Install Oracle 10.2.0.3 EE standalone on a new Linux RHEL 5 server, later to be used as a standby for a production RAC system. This means 2 lines of “runinstall -silent …”, less than 5 minutes of DBA work and maybe 20 minutes of waiting. I did not expect to spend over 5 hours doing this.

Problems started when I discovered that I don’t have the 10.2.0.3 patchset and another patch that exists on production and should be installed on the standby. I had to wait for my Metalink credentials to be approved for this customer CSI before I could download the patches for them.

“Why don’t you just clone the software from production?” asked a helpful colleague.

Sounds like a great suggestion. I cloned Oracle software before and it is a simple process: tar $ORACLE_HOME, copy the tar file to the new server, untar, run the cloning script which will register the new home with the inventory, and you are done!

In theory, at least.

Here is what actually happened:

  1. Tar, copy, untar, script
  2. Ran OPatch to verify that the new oracle home is in the inventory and that I see the correct version and patches.
  3. OPatch is showing two nodes. Oops. I didn’t realize oracle home has information about the cluster – didn’t Oracle move the inventory elsewhere? Spend an hour looking for the cause of this.
  4. Found that the two nodes are mentioned in  $ORACLE_HOME/inventory/ContentsXML/oraclehomeproperties.xml
  5. Removed this file.
  6. Deattached Oracle Home to clean inventory without deleting the software.
  7. Ran the clone script again
  8. Yay! OPatch looks good now.
  9. Decided to create test database  to be extra sure everything is fine
  10. NETCA failed with linking error. Spent an hour figuring out why. Cursed a bit.
  11. Had to install libc-devel, 32 bit version. Too bad RDA didn’t catch this.
  12. Created test database, but SQLPLUS now fails with linking error.  More cursing. Wondered what I did to deserve this.
  13. libaio.so.1 was missing so I had to install the 64 bit version of libaio. Too bad RDA was silent about this as well.
  14. Couldn’t start the database because the database couldn’t find the cluster. Why was it even looking for a cluster? Spent an hour figuring out why. Ah, because I copied the software from a RAC server and it was linked as RAC database.
  15. Relinked everything with RAC_OFF option.
  16. Finally things are working. Too bad it is 8pm already.

What I should have done: (I’m not sure if it is supported by Oracle, but at least it works)

  1. Double check that we have all RPMs.
  2. Tar, copy, untar
  3. remove $ORACLE_HOME/inventory/ContentsXML/oraclehomeproperties.xml
  4. run clone.pl: clone/bin/clone.pl ORACLE_HOME=/appl/oracle/product/10.2.0/db_1 ORACLE_HOME_NAME=OraDb10g_home1
  5. Relink as non-RAC:  make -f ins_rdbms.mk rac_off
  6. Verify with OPatch.
  7. Create test DB:
    netca /silent /responsefile ~/netca.rsp
    dbca -silent -createDatabase -templateName General_Purpose.dbc -gdbName TST -sid TST -SysPassword xxx -SystemPassword xxxx -emConfiguration NONE -datafileDestination /appl/oracle/oradata  -storageType FS -characterSet WE8ISO8859P1 -nationalcharacterSet AL16UTF16 -memoryPercentage 40
  8. Go for a nice afternoon ride.

I hope that I’m not the only DBA who always have to find the most difficult way to accomplish a task, and that this post will be useful to others. Perhaps the best piece of advice I can offer is to avoid this type of cloning in the first place.

 

BAAG, Best Practices and Multiple Choice Exams July 29, 2010

Filed under: musing,mysql — prodlife @ 9:19 pm

(This post originally appeared at the Pythian blog)

I’ve been following the discussion in various MySQL blogs regarding the sort_buffer_size parameters. As an Oracle DBA, I don’t have an opinion on the subject, but the discussion did remind me of many discussions I’ve been involved in. What’s the best size for SDU? What is the right value for OPEN_CURSORS? How big should the shared pool be?

All are good questions. Many DBAs ask them hoping for a clear cut answer – Do this, don’t do that! Some experts recognize the need for a clear cut answer, and if they are responsible experts, they will give the answer that does the least harm.

Often the harmless answer is “Don’t touch anything, because if you have to ask this question you don’t have the experience to make the correct decision”. As Sheeri noted, it is a rather patronizing answer and it is stands in the way of those who truly want to learn and become experts.

But I can appreciate that it comes from long and bitter experience. Many users read random bits of information off the web and then rush to modify the production database without fully digesting the details. They end up tuning their database in a way no database should ever be tuned. Not even MySQL.

I used to think that users search for those “best practices” out of laziness, or maybe a lack of time. I used to laugh at the belief that there are best practices and clear answers, because if there were – we wouldn’t have a parameter. But now I think the problem is in the way most institutions evaluate intelligence, which affects the way many people approach any problem.

Even though all of us DBAs come from a wide variety of cultures, I’m willing to bet that every one of us had to work his way through a multiple choice test. If you ever took an Oracle certification exam, you know what I mean:

How do you find the name of the database server?
A) ORACLE_SID
B) ORACLE_DBNAME
C) DB_ID
D) none of the above

You run into those in certification exams, job interviews and in slightly different variation when you try to get accepted to a university. You had to learn to succeed at those multiple choice tests at a very early age, or you would be labled “less intelligent”.

Yet those questions are absurd. In the question above, the answer could be A, but A would be wrong if my database is a RAC cluster. Besides a much better way would be to use /etc/oratab because there may be more than one DB on the machine.

But you can’t have a discussion with the exam author. You can’t ask for assumptions and clarifications  and you can’t even explain your assumptions in the test. What’s more, these tests also check for speed, so you don’t have much time to contemplate the options.

What these exams teach you is that every question has a single solution and one that is so obvious that once you see it, you recognize its rightness in less than 30 seconds. They also teach you that someone else knows the right answer (the person grading the test). So finding the right answer can be a matter of getting the “expert” to give you the one and obvious correct answer.

If our society teaches this point of view from a very young age, why are we surprised that DBAs keep looking for the one obvious answer?

I fully support the BAAG cause, and I believe that DBAs should be given a full explanation of the problem involved, the different choices that exist, their meaning, the trade-offs involved and in general give them the tools to make the right decision themselves time after time. But we should realize that undoing years of faulty teaching can take a while.

There is an even worse side effect to those multiple-choice-should-be-obvious tests. You may learn to never ask for clarifications. That asking for clarifications is “bad” or “wrong” in some way. In any real problem, asking for clarifications such as “why are you asking this?”, “what is the real issue you are trying to solve?” and  “How will you use this script?” is the most important part of finding a solution. It is not cheating – it is doing a professional job.

It is a scary thought that the very way we train and evaluate the future DBAs is something that will prevent them from doing a good job of being DBAs.

 

On the difficulties of Migrations – Especially to new Blogs July 29, 2010

Filed under: Uncategorized — prodlife @ 5:46 pm

I haven’t posted here in a log while. That’s because I’ve been posting all my stories and ideas over at the Pythian blog.

I knew that migrations are one of the most difficult tasks in IT operations, but I did not realize this also applies to blogs. Yesterday, Alex helped me look at the blog statistics over at the Pythian blog and it turns out that over there I have about 10% of the readers that I had over here. While I’m just as brilliant in the Pythian blog as I was here, I guess that with all the old links, google ranks and people not changing their RSS subscriptions – blog locations have a lot more momentum than I suspected.

Anyway, to the 90% of my readers who apparently only read me at this address, in the next few days I’ll copy over the blog posts that I neglected to post here. I’ll try to post new articles here in the future, but they will always appear in the Pythian blog first, so you really should add my new address to whatever it is you use to follow blogs.

 

 
Follow

Get every new post delivered to your Inbox.

Join 48 other followers