I'm just a simple DBA on a complex production system

Writing about all things production. Especially Oracle databases.

Looking for great PL/SQL June 24, 2008

Filed under: development, musing, plsql, tips — prodlife @ 2:23 am

Steve Yegge is a developer who writes way too much. Each one of his blog posts can easily take an hour to read, but they tend to contain nice gems that make the long read worth the time.

Today I spent an hour reading about 50% of a fairly interesting article about how to practice programming. His main point is that just programming on the job will not make you a better programmer. You have to actively strive to improve. (I just saved you an hour of reading!). One of his suggestions on how to become a better programmer is this:

Read through someone else’s code for 20 minutes. For this drill, alternate between reading great code and reading bad code; they’re both instructive. If you’re not sure of the difference, ask a programmer you respect to show you examples of each. Show the code you read to someone else, and see what they think of it.

Sounds like a good advice. Now I only have to find great and terrible code examples. Well, if I want to practice C, this is not an issue – for great code examples I read Linux Kernel code, and for awful examples I read code that I wrote 7 years ago (and it looks worse every year).

But for PL/SQL , I’m a bit stuck. I can still read my own code for bad examples, but where can I find examples for great code? My colleagues write nice enough code, but we have our own coding style and standards, and if something is wrong with these, it’ll be difficult to tell without outside input.  Somehow, there is simply no open-source code written in PL/SQL that I can read to get a good idea of how PL/SQL should be written.

The closest I’ve found is the code examples in Steven Feuerstein’s PL/SQL book. These have been extremely helpfull in teaching me how to handle exceptions, iterate through cursors, work with associative arrays, etc. But these are just examples, I can’t learn how to cleanly build a fairly large working module from these great examples. Its nothing like reading through the Linux kernel.

I think the issue is somewhat related to the problem Tyler Muth raised about lack of reusable PL/SQL modules. Oracle has an amazing community, but for some reason we don’t share our code and we don’t participate in open-source community projects. Most of our community efforts revolve around helping each other learn new features and solve performance issues.

We are probably missing an opportunity to help each other resolve performance problems before they occur.

 

ORACLE_HOME – to share or not to share June 19, 2008

Filed under: rac — prodlife @ 2:41 am

When setting up a RAC system, one of the questions that tend to come up is whether to have one ORACLE_HOME per node on local disk, or to have one ORACLE_HOME for both instances on the shared storage. These approaches are sometimes called “private home” vs. “shared home”.

Oracle have an amazing white paper on the topic, which I’ve been reading for the last two days: http://www.oracle.com/technology/products/database/clustering/pdf/oh_rac.pdf

We have both types of systems here (because we have two teams of DBAs with somewhat conflicting procedures), and I worked on both. As with many questions, there are pros and cons to each approach, and you decide based on your priorities. The paper (highly recommended!) covers almost all the pros and cons in lots of details. However, I still want to give here a short summary, based on my experience.

Why use private home?

  1. Easier rolling upgrades. If you need to patch your DB you can patch first one node and then the other, allowing for 0 downtime patches.  You can have rolling upgrades on shared home as well, but it is a longer procedure and not supported automatically by OPatch.
  2. You can’t lose your entire cluster by a careless delete. Never underestimate the impact of human errors.
  3. With shared home, mistakes done in ORACLE_HOME impact the entire cluster. With private homes – mistakes impact just one node.
  4. Add node / Delete node procedures are somewhat simpler (but take longer) on private homes. Especially when doing delete node, you don’t run the risk of deleting the shared oracle home by mistake. Very scary!
  5. Starting 10.2.0.4 Oracle demands that at least the oraInventory will be local.
  6. Oracle recommends local home. Sometimes, that’s a good enough reason.

Why use shared home?

  1. Quicker installs and upgrades, because there is no need to copy all files twice, over the interconnect.
  2. If your shared storage is advanced enough to support snapshots, you have the ability to take a snapshot of ORACLE_HOME before applying a patch and simply restoring the snapshot if the patching went wrong.
  3. You can really easily migrate nodes or entire DBs from server to server that way.
  4. No version compatibility issues between different nodes on cluster. You know that all your nodes are always running exactly same version, patches, etc.
  5. You don’t have to ssh from server to server while trying to track issues in alert logs and dump files.
  6. Shared storage tend to be more stable, have checksum, striping, mirroring and other nice stuff. With ORACLE_HOME on this storage, you are less likely to lose a node to media failure.
 

Few Links of the Week and a Biased Recommendation June 14, 2008

Filed under: books, links — prodlife @ 12:04 am

Because Pythian’s Log Buffer is missing due to unexpected appendix, and I can’t leave my readers with nothing to do all weekend.

Jonathan Lewis suggests browsing through V$SQL and provides a nice script.

OracleNerd explains how to shop for cars using SQL. I thought I’m nerdish for planning my purchases in Excel, but SQL is by far a nerdier method.

Marco Gralike gives a very cute HTTPURI example.

Tanel Poder shows the memory overhead of generating rows with “connect by” and shows a short and sweet solution.

Laurent Schneider’s Advanced Oracle SQL Programming book is now available for sale on Amazon. I’ve been lucky to get the chance to review the book. It covers some of the most advanced and exciting aspects of SQL programming, and it is full of useful, practical examples. Almost everyone who uses Oracle SQL regularly in his work can benefit from the information and ideas in this book. I know my programming skills improved significantly from reviewing it (and even my co-workers noticed!).

Not many posts this week. Maybe it is related to Euro2008?

 

Collecting Requirements for Disaster Recovery June 13, 2008

Filed under: Availability, tips — prodlife @ 6:04 am

When an earthquake wipes out your datacenter, it may be too late to do anything about it. Obviously you need to plan for such disasters in advance. Every IT organization, big or small, needs such plan. I hope your organization already has one plan, and that you test it on regular basis. But sometimes you need to plan for disasters from almost scratch. Maybe because your business never had a disaster recovery plan, or maybe because availability requirements have suddenly changed and the previous plan is insufficient.

So, How do you start writing your disaster recovery plan?

If you are an Oracle DBA, you may be tempted to start by configuring DataGuard. If you are a sys admin, you may be ordering additional machines and calling various ISPs. If you are a storage manager, you’ll probably pull out your vendor’s favorite remote mirroring solution. If you are sales/marketing, you probably already promised 99.99999% availability.

Don’t do any of that. You start by asking questions. Here are the questions we thought of a bit too late this time around, but next time we’ll ask before we even begin to discuss the right technology:

  1. What is acceptable time to recovery? Can we just ship the tapes somewhere, or do I need hot standby?
  2. How much data loss is acceptable? Can we recover from last nights backup, or do we need data from 5 minutes ago?
  3. How much performance degradation is acceptable during a disaster? For how long? Can I save a bit on the extra hardware?
  4. How much redo logs are generated per day? i.e. what is the rate of data changes that we need to support now?
  5. What is the expected data growth for this DB/App for the next year? How much will we need to scale our solution?
  6. How will clients access the system in case of disaster? Do we need to migrate IPs or can you use new ones?
  7. How often do we need to validate the DR site? Testing every quarter, 6 month, once a year?
  8. When does the DR need to be in place?
  9. How much of a downtime will be acceptable for returning back to the main site? How much in advance do we need to schedule it?
  10. Who decides that it is now a disaster and failover to alternate site (or backups) should occur? What are the criteria for the decision?

From my experience, the fewer questions you ask, and the simpler the questions are, the more likely you are to get good answers. And with good answers, you can choose your technologies, implement, test, rinse, repeat.

 

Notes about OCP June 7, 2008

Filed under: links, musing, ocp, tips — prodlife @ 1:30 am

Hey, I passed my OCP exam today. Now I just need to take care of the hands-on course requirement and you’ll be reading the blog of an Oracle Certified Professional. It is more interesting already, right?

I already explained my study method when I took the OCA. This time I followed a similar plan, with three main changes:

  1. I did not hike up Mount Shasta the weekend before the exam.
  2. When I studied for the OCA I already knew all the material before I began studying. Simply because I spent years installing and maintaining Oracle. For the OCP, most of the material was new. I was completely unfamiliar with RMAN, MMON alerts, flashback database, recyclebin, IOT, clusters and resource manager. This meant that I had to do a lot more practice in order to feel comfortable with all those tools. I spent an entire weekend backing up and restoring with RMAN, and it was time well spent.
  3. I followed good advice from my favorite OCM and studied quite a bit from Oracle’s official documentation. Mostly the backup and recovery guides. I found them way more accurate and more readable than the exam guides, and as an added bonus – you can send to Oracle any problems you find in the doc. I enjoy complaining about mistakes so much that the study process was actually fun.

Which brings me to a very important point – when you take mock exams from any of the study guides, verify each and every one of their answers with the oracle docs or with a live system. I found about 15% mistake rate in their answers, and you really don’t want to memorize the wrong things.

Its unbelievable, but I can honestly say that I’m a much better DBA now that I’ve studied for the OCP. I’m much more comfortable with most backup and recovery scenarios (Few years ago I caused unnecessary data loss while recovering a corrupt data file, simply because I did not know not to restore redo logs!) and I know how to use new important features that I didn’t know about before (or at least didn’t look into seriously).

So, my two cents for value of certification for employers debate: DBA with four years and experience and an OCP is probably better than four years of experience and no certification. I know I am.

What’s next for me? My team lead wants me to go for OCM. I’m not sure I’m quite there yet. I’m thinking of 11g upgrade and RAC Expert certificate. But before all that, I promised myself a white-paper and a presentation this year.

***

Lots of good posts this week:

Tanel Poder wrote an incredible V$ sampling query. Its simply brilliant. He also showed how to use his Snapper tool to find who is generating excessive redo and saved me a long session with logminer last weekend.

Steven Feuerstein shows how to run OS commands from PL/SQL.

Happy birthday to Syed Jaffar Hussain!

Alejandro Vargas posted a good RMAN practice.

Rob Van-Wijk wrote nice string aggregator using model. (Because Oracle SQL programmers do it with models!)

 

Joining Unrelated Data June 3, 2008

Filed under: scripts, tips — prodlife @ 8:36 pm

I’m sure this happened to every DBA – an important customer sends an Excel spreadsheet and explains that he wants this report generated from the DB on a weekly basis.

At first look the report looks easy enough and you say “Sure, you’ll have it ready by the end of the week”, but on second look something seems off and you give the customer a call “Listen, it seems that the data in the columns is not really related. First column contains your customers and the second contains the contact emails for each customer, which is fine. But the third column contains the open issues for the customer , and it is not really related to the contact emails column.”
And the customer calmly replies “You are right, it isn’t related. Thats how I want the report. I’ll have it ready by Friday, yes?”

No choice but to write a non-relational join. I do this by generating a row id column and joining on that.
For example:

SQL> create table customers (cust_id number, cust_name varchar(20));

Table created.

SQL> insert into customers values (1,'cust A');

1 row created.

SQL> insert into customers values (2,'cust B');

1 row created.

SQL> insert into customers values (3,'cust C');

1 row created.

SQL> create table contacts (cust_id number, email varchar(64));

Table created.

SQL> insert into contacts values (1,'a1@a.com');

1 row created.

SQL> insert into contacts values (1,'a2@a.com');

1 row created.

SQL> insert into contacts values (2,'b1@b.com');

1 row created.

SQL> insert into contacts values (3,'c1@c.com');

1 row created.

SQL> insert into contacts values (3,'c2@c.com');

1 row created.

SQL> insert into contacts values (3,'c3@c.com');

1 row created.

SQL> create table issues (cust_id number, description varchar(4000));

Table created.

SQL> insert into issues values (1,'Late delivery');

1 row created.

SQL> insert into issues values (1,'Missing components');

1 row created.

SQL> insert into issues values (2,'Wrong address');

1 row created.

SQL> insert into issues values (3,'Malfunctions');

1 row created.

SQL> col email format a10
SQL> col description format a20
SQL> select nvl(t1.cust_name,t2.cust_name),email,description from
  2  (select dense_rank() over (partition by cust_name order by email) n, cust_n
ame,email
  3  from customers cr join contacts cn on cr.cust_id=cn.cust_id) t1
  4  full outer join
  5  (select dense_rank() over (partition by cust_name order by description) n,
cust_name,description
  6  from customers cr join issues i on cr.cust_id=i.cust_id) t2
  7  on t1.cust_name=t2.cust_name and t1.n=t2.n;

NVL(T1.CUST_NAME,T2. EMAIL      DESCRIPTION
-------------------- ---------- --------------------
cust A               a1@a.com   Late delivery
cust A               a2@a.com   Missing components
cust B               b1@b.com   Wrong address
cust C               c1@c.com   Malfunctions
cust C               c2@c.com
cust C               c3@c.com

6 rows selected.

Very ugly. I hope the customer will be happy with his report.