I'm just a simple DBA on a complex production system

Writing about all things production. Especially Oracle databases.

3 lessons I learned today, the hard way August 30, 2007

Filed under: development,maintenance,tips — prodlife @ 2:33 am

Sometimes you have a difficult day. I heard that turning difficult days into lessons and viewing them as an opportunity for growth and improvement should help. So here is what I’ve learned today:

1) Always, always work from spec. That is, a detailed description of what the project should do, and what it shouldn’t. This description should be accepted by the customer representative. You will get in trouble if you skip this, even if it is a tiny and obvious project. I got a project to write a utility that verifies that the customer gets his data dump for each of his 100+ schema every day and alerts us if it didn’t. Sounds obvious, right? So I worked without spec, only to learn later that the customer expected the utility to verify that the file was created before 3am. Not a huge thing to fix, but its still better to get this information before it becomes a production issue

2) Keep an updated list of all the development projects in the department, who requested the project, important dependencies and who is responsible for it. Six month ago we upgraded a bunch of customers to a new version of the product and moved them to a new DB. The old schema were deleted last week. Today the application owner found out that an important procedure that the customer used in the old version was never copied to the new DB. None of the DBAs knew that the procedure existed, so it was ignored in the migration procedure. Now the DBAs can recover the procedure from backups, but the application owner is still looking for a developer than will make the procedure match the new application version. You can prevent this sort of confusion by having a list of projects, their customers and a developer who should maintain each.

3) When you type: create global temporary table as select from.... don’t be surprised if you get an empty table. You need to type create global temporary table on commit preserve rows as select from.... Create as select has an implicit commit in there.
Also, The following query will fail with “invalid number” error:
select 'number of header blocks in file ' || file_name || ' is ' || blocks-user_blocks from dba_data_files
but this will work:
select 'number of header blocks in file ' || file_name || ' is ' || (blocks-user_blocks) from dba_data_files
Apparently || operator has precedence over + operator which makes the first version meaningless.

Hopefully, I’ll never make any of these mistakes again.

 

Recommended articles from Metalink August 29, 2007

Filed under: Linux,metalink,Oracle — prodlife @ 1:20 am

Another DBA team in the company hired a consultant to review and comment on their RAC installations, and they shared his report with us. Some of the recommendations were rather mysterious and involved changing kernel parameters I was not used to changing, so I decided to look around metalink to see if these parameters and their effects are documented somewhere.

I didn’t find what I was searching for. I have lots of trouble getting the metalink search engine to find anything useful for me. Oracle can probably get rid of many support calls by making a deal with google and using them for the metalink search. But while I was looking, I ran across two cool articles.

Note:265262.1 – Things to Know About Linux
A very cute document summarizing everything a DBA ever wanted to know about Linux, or almost. Links to installation guides, useful prerequisites and parameters, how to find out important information (like whether you are running 64 bit or 32), how to relink the kernel and even debugging tools.
A must read, must bookmark, print and hang besides the desk. This one is a keeper.

Note:169706.1 – Oracle Database on AIX,HP-UX,Linux,MacOSX,Solaris,Tru64 Operating Systems Installation and Configuration Requirements Quick Reference (8.0.5 to 10.2)
They are cheating with the name – the document also include 11g.
A useful and concise summary of the installation pre-reqs on unix platforms. Nice if you are doing many installations on different platforms and are tired of looking up a specific detail in a huge installation manual.

 

How MySQL snuck up on us August 28, 2007

Filed under: mysql — prodlife @ 3:04 am

In some parts of the geek universe, MySQL is pretty much the only database in existence. If you are a web developer, using python, perl or ruby and want to develop a web site of almost any size, you will almost certainly choose MySQL. In the web development world, it is the obvious choice. It is cheap, it does the job and everyone else is using it too.

In the enterprise application world, MySQL is barely being acknowledged. I spent 30 longs minutes on the phone with Netapp support before they finally realized that I’m talking about MySQL, which is not SQL Server and they don’t have a snapshot based backup solution for us. I’m not disparaging Netapp support. Netapp has the best support I’ve ever encountered, but MySQL is simply not part of their world.

So, we have a world of free databases and cheap storage, and we have the world of Oracle and Netapp. Paul Vallee of Pythian made an excellent case of why Oracle DBAs should start learning MySQL, but I kind of suspect that once the organization got used to free databases, they may be a bit cheap on the DBA salaries too.

MySQL is difficult to avoid these days and it is sneaking up into the most conservative enterprise organizations. Developers like it a lot, for some reason it seems to be easier for them to learn and use than Oracle. Maybe because installing it is a single rpm command and it usually arrives pre-installed with their Linux boxes?

Of course, what developers like ultimately finds its way into production. We are an ITIL organization with very detailed deployment procedures. However, by the time the DBAs are involved, it usually looks like this:
Application Owner: Hey DBAs, we are deploying a brand new application tomorrow. We need you to set up the database.
DBAs: Sure. SQL Server or Oracle?
Application Owner: MySQL. Is that a problem?

Finally we reached a common ground: The application owner understands that his database will get shaky support from us for now, Management knows that we need to purchase MySQL support, and the Oracle DBAs need to learn MySQL. I just hope this won’t decrease my salary.

 

Log Buffer #59: a Carnival of the Vanities for DBAs August 24, 2007

Filed under: links,Oracle,SqlServer — prodlife @ 4:02 pm

As I’ve hinted at the beginning of the week, this week I’m writing up edition #59 of Log Buffer. Dave Edwards of Pythian Group contacted me couple of weeks ago and offered this opportunity. I hesitated quite a bit, this is a big responsibility, and I was not sure my rather new blog is ready for such exposure. I’m glad I agreed. Dave Edwards sent me a bunch of helpful links to other interesting blogs, so I got to discover many valuable blogs I’ve managed to miss so far, and to read about databases that I usually don’t see much. Despite the fact that the links Dave sent included many databases besides Oracle, this Log Buffer will probably be a bit biased toward the worlds most popular DB.

Oracle 11g is still the hottest topic in Oracle blogsphere. Tanel Poder started an 11g internals series with an article about automatic memory management, getting into the implementation in Linux. Kevin Closson continued looking into this topic with a typically well researched article.

Blogging About Oracle also has a series about Oracle 11g, written by different authors. I particularly liked Part 7 – Function Result Cache, written by Roel. He gives a good example that demonstrates the value of this feature.
Virag Sharma, in Database, RAC, Dataguard and Apps writes about SQL Performance Analyzer, one of the hottest features in 11g, and Oracle Brains explain how to tune using 11g’s invisible indexes.

If you are still not tired of 11g, Eddie Awad posted 40+ links to 11g articles and blog posts. This should keep everyone busy during the weekend.

Oracle Open World is three month away, but Dan Norris and Shay Shmeltzer are already getting ready for the event. I also need to start looking for interesting sessions. there are always so many to choose from.

And in other news: Tanel Poder published a seriously cool script for tracking session level performance stats that work even when you are not allowed to do anything on the database.
Alex Niujten, on the highly useful Amis Technology blog explained how to create table as select when you have nested tables.
In Eye on Oracle, Mike Brunelli is collecting information about the quality of Oracle support. I hope his project will generate many responses and maybe it will even cause Oracle to rethink their support organization.

Daniel Fink, the Optimal DBA has fun with SQL and SQL*Plus while dynamically assigning column names. Don Seiler at Die Seilerwerks writes how he used trace 10053 to determine how CBO does its job, and Jakub Pawlowski points to training material about PL/SQL that Steven Feuerstein published in his site.

SQL Server fans continue to blog about SQLServer 2008, Bob Beauchemin writes about SQL Server’s support for extended events. Jeff Smith writes about composite primary keys. I truly hope that everyone already knows about them, but if you don’t – its a must read, and Mladen Prajdić , at I want some Moore, explains how to modify data in a result set that uses joins. I wish he had written this article few month back when I needed it.

Meanwhile, on the OpenSource web:
Ronald, at Technical Notes blog posts a bunch of links for Oracle DBAs learning MySQL and also advice regarding backup and recovery. Last week I received a task to take ownership of MySQL server and write a recovery procedure for it, I can testify that both articles are very useful.
Charlie Cahoon ends in his Summer of Code blog with a release of his MySQL proxy. Corra is already using MySQL Proxy on Ubuntu.
Morgan Tocker and Brian “Krow” Aker try to decide how big transactions should be, and last but not least </depesz> explains how to secure PostgreSQL.

Whew, that has been one long post. I didn’t realize how much interesting things are being written by DBAs every day. There were so many great articles that choosing and picking them for this post was much more difficult than I expected. I highly recommend every database blogger to try writing Log Buffer once or twice, its a unique experience.

 

Oracle Documents August 22, 2007

Filed under: books,rants — prodlife @ 3:01 am

I’ve run into a nice post about RTFM on the BAAG journal. Yes, I know this is not exactly news, but I think its going to resonate with many DBAs. However, I suspect that the RTFM post oversimplifies what is actually a rather painful issue.

Lets start with the fact that many DBAs are not native English speakers. They know enough English to get along nicely, but reading technical documents can still be a slow and painful activity. Perhaps slower than waiting for a kind soul on a mailing list.

Then, there is the fact that getting meaningful answers out of Oracle documentation is a bit of an art form. There are both OTN documents and documents in Metalink. They sometimes contradict, so you need to verify which is more updated, but make sure it matches your version.

If you need to know how to configure automatic memory manager or how to rebuild an index, the documentation is pretty good. If you need to know what happens when you set SGA_MAX_SIZE to zero while using automatic memory management, you are in for a significant search at the end of which you will have more information and still end up having to make a guess which may or may not be correct. I remember looking for a good quote that will allow me to prove to my boss that if we rebuild an index with parallel option, all queries using this index will be parallelized to the same degree. I couldn’t find one, although it could be inferred by combining several paragraphs from two different books in the right way.

There is a reason for the huge market for Oracle books other than the official documentation. Thats because the official documentation is difficult to read and sometimes is not even good enough as a reference. There are a bunch of websites, blogs and magazine articles that explain information that is already contained in Oracle documents. A co-worker is learning PL/SQL and asked for good book. You can bet I told him to get Steve Feuerstein’s PL/SQL book and didn’t tell him to download Oracle’s book from OTN.

So, I agree with Simon Haslam at BAAG that people need to RTFM. I just wouldn’t recommend Oracle’s official documentation for that purpose.

 

Links of the Week August 20, 2007

Filed under: links — prodlife @ 11:15 pm

I’ll be doing Log Buffer – Carnival of Vanities for DBAs next week. As a practice run, here are the posts I’ve read and loved last week:

Kevin Closson wrote about configuring lots of DBWR processes, I’ve read the first article, but now he already has two new posts on the subject. He really gives good specific advice on when you need multiple DBWR processes and how to configure your system for maximum efficiency in this case.

Dizwell posted an article going over all the new parameters in Oracle 11g.

Gregory Guillou at Pythian Blogs wrote an excellent article about the SQL Performance Analyzer in 11g, with the most detailed example imaginable.

I’ve discovered utPLSQL, a unit test framework for PL/SQL developers. Guaranteed to improve your life and your code.

Citrix bought XenSource. I think its pretty important news which very few people noticed.

Jeff’s business and technology blog posted What Databases should do for me. A post with significant number of factual mistakes but with an interesting idea about automatic indexes.

You can also find a good article on how to install 11g on NFS over at Oracle-Base.

Enjoy the week and don’t forget to check back on Friday for the real Log Buffer!

 

Oracle Performance Training with Kyle Hailey August 20, 2007

Filed under: Oracle,training — prodlife @ 9:16 pm

On Friday, I’ve been to a training day with Kyle Hailey, organized by North California Oracle User Group.

It was really a great experience. The first part of the day was dedicated to ASH – v$active_session_history. We learned how to use the data, how to use the related graphs in Enterprise Manager, and how to get similar data when you don’t have the ASH table (ASH requires 10g and a special license). I already worked with ASH before, but it was still very interesting and usefull.

The second part of the day was dedicated to specific wait events – why they happen and how to deal with them. The talk was very technical – including overviews of the buffer cache memory management (I didn’t know Oracle keeps the buffer cache in a hash table), and explanations of all the scenarios where you can run into TX-Enqueue lock. There are more scenarios than you think, and most of them will be hell to debug.

We also received a copy of “Oracle Wait Interface: A Practical Guide to Performance Diagnostics and Tuning”, I’ve starting reading it during the weekend and it seems like a useful and interesting book. Maybe I’ll review it when I’m done.

I was really impressed by the audience. No one asked stupid and annoying questions, on the contrary, many people had very insightful comments and questions. Everyone I talked to during the breaks was interesting, had interesting experiences to share and great comments when listening to my war stories. Can you tell I’ve had fun? I’ve also learned that I’m probably the only DBA who did not try ASM yet. I really have to get to it.

So, hurray to NoCoug who organized this great day. I strongly recommend Kyle Hailey’s talk to all DBAs.

 

How to automate tasks August 17, 2007

Filed under: design,maintenance,musing,plsql — prodlife @ 1:34 am

Automation has always been the holy grail of information technology departments everywhere. Sysadmins and DBAs always prefer to automate the routine and boring tasks to make more time for interesting projects. Managers also like automation, mostly because this means fewer employees can do more work, and because automated tasks usually involve fewer human errors.

Some tasks are automated to a point where they run automatically at a specific time every day without DBA present. Backups are the most common example of this. More often the DBA writes a script that will automatically do what used to be a series of steps done manually by the DBA at some point, every DBA has the “create schema” script that will create a user with the permissions and objects he is going to need.

After years of automating different tasks, I’ve come up with a method of doing this which usually brings good results, and which I always regret when I don’t follow:

1. Automate the task after you did it manually at least three times. This gives you better understanding of the different scenarios you need to prepare for and also prevent you from wasting time automating tasks that turn out not to be routine after all.

2.  Write down all the steps you do when you perform the task manually. This helps when writing the script, and also serves as good documentation later on.

3. Write a script that does the task based on the steps you wrote earlier.

4. To verify that the script indeed does the right thing I run it in debug mode – step by step, verifying that each step behaves as expected. Step n  is supposed to shut down the listener, verify that after step n the listener is really down.

5. The script should output a log with all the action taken, responses and errors. You will need this when something goes wrong.

6. Run the script several times and verify the results and log.

7. Ask someone else to review the script with you. This can help catch bugs, make sure the script is readable to other people and have the added benefit of having someone else who can maintain the script if needed.

8. If you schedule it using cron, oracle job scheduler or a similar system, add a change to your change tracking system saying that this script now runs automatically every day at 3am.

 

Change Management August 13, 2007

Filed under: maintenance — prodlife @ 10:26 pm

“Change is not merely necessary to life — It is life.” ~ Alvin Toffler

Changes to production systems are a significant part of my job. At any given week there is a patch that should be applied, a customer that wants to restore his database from older backup, SGA that need to be resized, a RAC node to drop, and many many more.

All DBAs know, but hate admitting, that the number one cause of unscheduled downtime is maintenance that went wrong. Yes, we cause most of the incidents in the system. Random crashes are not as common as we prefer to believe. This is actually good news, because if we cause the incidents, it may be possible to cause less of them.

We try to cause less incidents during changes by having a change control process in place. Our process includes a change request form, filled online. The form contains details such as the change procedure (i.e. build index), the verification procedure (check execution plan for query and see that it uses index), risks (table will be locked when we build index, users may be impacted) and rollback procedure (drop the index). Just filling this form and thinking on the process and its risks probably prevented few incidents, and there is the added bonus that you don’t need to figure out the rollback while you are in the middle of a  catastrophe. The next step is that the responsible manager needs to review the change and approve it, hopefully catching any mistake missed by the DBA. In many cases, the manager will ask the DBA to try the procedure on a test system before the change is approved.

As you can imagine, this is lots of overhead work, and it can often be annoying. The important question is: Does it work?
It does in most cases. Writing down the procedure, having someone else go over it and trying it in advance on a test system often catches problems that would have caused a severe incident on production.

When doesn’t it work? There are two cases where the process doesn’t really help:
1. Small and well known changes.  There are some tasks that are really short and easy and that we did hundreds of times already. In this case, the change itself can take 5 minutes but the procedure can take from an hour to few days (if we need to set up a test system from scratch)
2. Random issues. Even the most planned process doesn’t help when something truly random happens.  We upgrading from 10.1 to 10.2 worked great in staging, but in production it hanged, or maybe patch installation that worked great when tested but failed on production. Often, even Oracle support won’t be able to tell you why the process failed just when we really needed it to work. Call it Murphy’s law.

Another nice side effect of managing all changes in a change control system is that if you notice a performance issue that started on July 27, you can use the system to check what changed on that day. It doesn’t guarantee a solution, but it could be a lead.

All in all, I strongly recommend having a good change control process which include a review of the change system by a second pair of eyes. The benefits are clearly worth the overhead.

 

Becoming a better DBA August 10, 2007

Filed under: books,Oracle — prodlife @ 11:49 pm

I’ve read Coskan’s post about invalid DBAs and now I’m a bit worried that I’m nearly invalid. Too many of the traits he mentioned hit too close home.

So, here’s how I plan to recompile my professional skills:

  1. Read the concepts guide. I’ve read parts of it, but it is probably time to give it a more through read.  Pretty much everyone agrees that one cannot be a good DBA without reading it.
  2. Learn to do wait interface tuning. Start by reading  the book and trying out the ideas in the test system. I know the theory, but currently I do all the wait interface analysis using Ignite for Oracle. A marvelous tool, but a DBA should know how to get along without fancy tools.
  3. Get more acquainted with Oracle security. Maybe start by reading Pete Finnigan‘s papers.
  4. Get Certified! This will instantly make me a better DBA, no?

This short list will probably keep me occupied well into the end of the year. Especially the performance tuning part – Millisap’s book is not exactly a short and easy read for the next flight. Hopefully I’ll be able to start the next year as slighly less invalid DBA.

 

 
Follow

Get every new post delivered to your Inbox.

Join 48 other followers