I'm just a simple DBA on a complex production system

Writing about all things production. Especially Oracle databases.

Kosher Visualization September 23, 2009

Filed under: Visualization — prodlife @ 12:04 am

I’m working on my visualization presentation (OOW unconference, October 12, 4PM – don’t miss it!), and one of the topics I keep rethinking is how to present results of research in a visual way. Especially when the report or presentation is for non-technical management.

It is perfectly easy to take true data and arrange it on a chart in a way that “proves” whatever it is you want to show. But is it wrong? Is there a one true way to display data and everything else is a lie and a distortion?

Lets look at a handy example: http://www.cunningham.me.uk/wordpress/2007/07/11/how-to-lie-with-statistics-as-shown-by-the-bbc/

In the example (look at the graphs with the red lines), scientists measured a temperature increase of 0.5 degrees over a period of 30 years. The blogger thinks that the first chart lies – because they make a tiny change look scary by changing the scale of the chart. He then shows the “correct” chart where the scale is changed to the point that the temperature increase is barely noticeable.

But is it really that straight forward? Another point of view can be that temperature change of 0.5 degrees over 30 years is a huge deal, and is indeed scary and the graph was scaled to make the correct scientific view more visible. By rescaling the graph you are actually obscuring an important truth and misleading the audience.

What is the truth? I’m just a simple DBA, I’ve no idea about global warming.

But when I do research about a performance issue and then I write a report about the results of my research, and I use charts to demonstrate the important points in my results – I find it legitimate to scale the graphs in a way that makes the important points as clear as possible. If my graphs don’t demonstrate my points in the clearest way possible then I’m doing a bad job.

However, to keep myself ethical, I follow few rules about these modifications:

  1. You are 100% sure, to the best of your knowledge and research, that the point you are making is indeed correct. You are not allowed to hide data just because you did not do a very good job at collecting or analyzing it.
  2. You mention the modification in the report or presentation. You make the original data available to anyone who wants to verify your results.
  3. You have very good reasons for the modifications you did and you feel comfortable presenting them to anyone who questions your charts.
  4. You will be extra careful when rescaling data that is displayed as two dimensional shapes, and make sure that the proportions between the rescaled areas indeed reflect the proportions of the data. Because in 2D small changes are doubled.

You’ll notice that my advice is somewhat subjective – that because I don’t really see an objective way to differentiate between “highlighting an important truth” and “making mountains out of molehills”. You did the research, you know if 0.002ms increase in storage network round-trip time is a big deal or not, and you should decide how to display it. Obviously, if you manage to find a clear and unquestionable way to display your results, so much the better.

 

Good Stuff September 16, 2009

Filed under: links — prodlife @ 11:14 pm

Oracle Open World! I’ll be there, and so will lots of other cool people. Don’t miss the blogger meetup where we’ll all hang out :) Don’t miss the unconference. The line-up is better than what I see at most events – Greg Rahn, Cary Millsap, Kelvin Closson, Rob van Wijk, Alex Gorbatchev, Richard Foote and I will all be there.

Dr. Neil Gunther! One of the top most performance specialists. His blog is not easy to read, and is not strictly Oracle related, but I’m always glad I take the time to read it because I learn so much. Its also quite entertaining (for load testing nerds). For example: “Without knowing any details, I can see is that the test rig was driven into saturation, starting with the first concurrent request! Therefore, the first data points provide all the comparison information. The other measurements are redundant (log axis or no). So, what’s the point of the plot?”. Oh, and he also has good twittings!

Exadata v2! A DB server so fast the only way to describe it is ridiculous! There’s still not a lot of technical information out there about it, but the FAQ is a good start.

Advanced Oracle Troubleshooting Seminar at NoCoug Unbelievable, but two month before the event 50% of the sits are already taken. If you are interested, you should probably hurry up. Early bird registration ends in a week. Don’t say I didn’t warn you.

Shell tricks! Don’t know about you, but I still do my scripting with BASH. Jared Still posted some useful tricks.

Please post more cool stuff in the comments. Also suggestions for books I should read on my daily 3 hour train commute to OpenWorld will be nice.

 

Two Things Everyone Should Know About Queues September 16, 2009

Filed under: musing, performance — prodlife @ 10:38 pm

If you are in the performance business, you should know a lot about queues. How to use them to find performance problems, predict issues, plan your capacity, model your load test results, etc. Queues are just a part of what you should know and be comfortable discussing.

But what if you are not a performance professional? What if you are a sales person or a manager or a dentist? Do you still need to understand queues?

Obviously not everyone should know queues at a precise mathematical level. But queues are everywhere, and sometimes I wish people around me understood queues better. It’ll make it easier for me to explain things. There are two things I think everyone should know about queues:

  1. If it takes me one hour on average to handle a request, and I get one request every hour – most of the time requests will be delayed due to queueing and backlog. Running your DBAs (or servers, or doctors, or toll-booths) at full utilization with every minute accounted for means queueing and delays.
  2. If there are multiple servers (or DBAs or DMV clerks), the most efficient way to get service is to arrange all the requests in a single queue and have all servers accept requests from that queue. The way supermarkets do it – a different queue per cashier is inefficient. Deciding that you want all your requests to be handled by a specific DBA because she is better looking is also less efficient than entering the request in the general DBA queue.

Spread the word :)

 

I Can Has Training Budget September 11, 2009

Filed under: tips, training — prodlife @ 10:37 pm

We know how it goes – there is a recession, and companies try to reduce expanses. The next thing you know, your training budget is all gone. Or maybe there is some training budget left, but now 6 DBAs share a sum that is not enough for one Oracle University course. How do you convince your managers that paying for your training is the best investment they can make?

Start by convincing yourself. Remember that your manager probably got to his position because he is good at reading people, so if you don’t really want the training, or don’t really believe you need this training, he may see that and you lost. You have to be 100% sure that you want this training because it will really allow you to improve the way you work.

As an example, lets assume you want to go to Linux Administration course. Its an interesting case, because it is not even evident that a DBA should go to such course.

Then think about your boss for a bit – what parts of the job are most important to him? what are his pet projects? pet peeves.

Once you have your desire for the course and your bosses desires in mind, make a list of all the benefits you can see from going to the course. The important thing is to highlight how the things you want to learn will help with the projects that are most important to your boss, or will address his specific pain points.

So, if your boss loves automation say: “I will learn more shell linux tools so I’ll be able to write better automation scripts”.
If he is a capacity planning person, say: “I will be able to better monitor the OS so we can be more proactive about provisioning”.
If he is a big fan of RAC, say: “With my improved Linux knowledge, I’ll be able to understand low-leve clusterware issues and solve them faster!”

Now you need to decide if you make your pitch face to face or by email. I prefer email. Information I put in the email:
* Course title and instructor (or school name)
* Dates/Times
* Location
* Price
* The list of 3-5 reasons I need this course (as you prepared in the previous paragraph).

Until he makes his decision, keep mentioned once or twice a day how the things you do now will be much better after you take the course: “I still don’t understand how to debug coredumps after the process crashes, but the Linux course may help”, “It takes me 2 hours to copy old files to the second disk, but I’ll probably learn how to do it faster in the Linux course”. Don’t force it, but keep an eye open for opportunities to explain and demonstrate the value you see in the course.

And a questionable tactics that sometimes works: Get an ultra-expensive course rejected before asking for a reasonably-priced course. “I can totally understand you don’t have the budget to send me to Collaborate in Denver, but what about one day training given by our local usergroup at a near-by location?”. I’m not sure if this tactic works because the manager feels guilty about rejecting my request, or if the lower-price seminar just looks better in comparison. I’m not even sure if I recommend it, really. Consider and act at your own risk ;)

 

OOW09 – Tradition Edition September 9, 2009

Filed under: openworld09 — prodlife @ 10:35 pm

This year will be my third time I’m attending Oracle Open World. When you do something every year for 3 years in a row, you develop few traditions around it.

Even though I know I always have an amazing time there, I’m always worried before. I remember the commute, and the fatigue and the boring marketing contents. Somehow the memories of great discussions in the OTN lounge with amazing people are less vivid. So being anxious before is definitely a tradition.

Some traditions do not continue – this year there seem to be no blogger meeting. I guess I’ll need to be a bit more proactive about meeting my online colleagues. Like, email everyone to check if they will attend OOW and ask if they want to date me :) You can also leave a comment here if you want to hang out together.

A tradition I hope not to continue is over-scheduling sessions. I looked for presenters I know, especially those I enjoyed in previous years. Some Streams and RAC 11gr2 sessions, to make sure I keep on top of my favorite technologies. I made it a habit to attend “Current Trends in Real World Performance” session – it is consistently the most enlightening session in OpenWorld. I’ll probably rewrite my schedule few times before the conference, and few times a day during the conference. Thats traditional too.

I’m excited to continue the tradition (started last year) of giving an Unconference session at OpenWorld. Last year was my first ever Oracle presentation – I gave a live demo of streams configuration and troubleshooting. It was wonderful. This year I feel like a veteran presenter – I gave 4 presentations at conferences in the last year. I am going to talk about graphical methods (under the sexy name – visualization). To be honest, I still don’t know what exactly I’ll talk about. I have lots of ideas – using charts to explore the data and solve problems, using charts to prove a point in reports and presentations, how not to lie or confuse when charting data. I plan for lots of examples. I’m looking forward to cooking all these ingredients into one delicious presentation.

I’m presenting on Monday, 4pm – looking forward to see you all there, because meeting amazing people is my favorite OOW tradition.

 

Real Life Block Corruption (Maybe) September 4, 2009

Filed under: musing — prodlife @ 1:28 am

What’s the worst thing that can happen to a database? I think most DBAs will agree that block corruption is a good candidate on the list. When DBAs debate the soundness of their backup policy, corrupted blocks are often used as test cases and rhetoric devices: “Keep just 3 days of backup? But what if a block is corrupted on Saturday and we don’t find out until Monday?”.

Until this week, I only knew about block corruptions from my certification studies and from recovery practices (using dd to corrupt blocks is a common gambit).

We had a block corruption this week. At least, we think we did – neither us, nor Oracle support are 100% certain. It was nothing like the text books described.

On Saturday, our DB crashed. The error in the alert log indicated a corrupted block. We restarted the DB, and…. did nothing. My manager sent me an email asking me to open a ticket to Oracle about this. I saw the email on Monday, failed to realize the importance of the problem (I suck!) and proceeded to work on other tasks.

On Tuesday the DB crashed again. This time it also sprouted endless Ora-600 [2662] error message once it started. We gave it another restart, this time it started fine. I did open the ticket to Oracle. Priority 1. We ran a bunch of verifications – RMAN validation, DBV, analyzing bunch of tables and indexes.

RMAN and DBV did not detect any issues. Full export completed successfully. No one is actually certain this is a block corruption. The only strangeness was an index that appeared in DBA_INDEXES but did not exist when we tried to run analyze. We asked our sys admins to check the machine, the OS and the connected storage.

On Wednesday the server crashed again. Again a corrupt block. Different file this time. Oracle supports found that one of the millions of ORA-600 and ORA-7445 errors we’ve seen could be related to a SQL parsing bug and suggested a patch.

We’ve had it. In an emergency 10 hour maintenance, we used export/import to move all the schemas to a new DB server.

We hope this is the end of the problem, but we can’t really tell. Which is exactly how real DBA life is so different from textbook descriptions and recovery practices.

 

Automatic Maintenance Tasks August 27, 2009

Filed under: 11g, tips — prodlife @ 1:48 am

Automatic Maintenance Tasks is a new 11g feature which I recently noticed. Its a bit embarrassing, since I’ve had 11g in production for nearly a year and apparently I’ve been using the feature all along.

I discovered the feature when I noticed that the automatic statistics gathering job is running several times on a weekend, instead of just once as it did in 10g. Then I discovered that the job has a very strange name starting with ORA$, and that the name changes every time the job runs.

Turns out that Oracle’s automatic jobs are not longer jobs. They are now Maintenance Tasks.

Here’s how Oracle defines the tasks:
“When a maintenance window opens, Oracle Database creates an Oracle Scheduler job for each maintenance task that is scheduled to run in that window. Each job is assigned a job name that is generated at runtime. All automated maintenance task job names begin with ORA$AT. For example, the job for the Automatic Segment Advisor might be called ORA$AT_SA_SPC_SY_26. When an automated maintenance task job finishes, it is deleted from the Oracle Scheduler job system. However, the job can still be found in the Scheduler job history.”

And here’s the reason my statistics ran several times on a weekend:
“In the case of a very long maintenance window, all automated maintenance tasks except Automatic SQL Tuning Advisor are restarted every four hours. This feature ensures that maintenance tasks are run regularly, regardless of window size”

What practically changed? Almost nothing, we had schedule windows in 10g, and the maintenance jobs (not tasks) ran within the defined windows. I’ve no clue why this change was necessary.

It definitely looks like infrastructure prepared for a future cool feature. At present, it just looks weird. For instance:

  • You can’t add tasks. Oracle has 3 predefined tasks – statistics, space advisor and tuning advisor. You can add or remove maintenance windows and define in which window to run each task, but you can’t add your own task.
  • There are lots of seemingly unnecessary definitions around. For example, from the dictionary tables, you can see there are task clients and task jobs. Currently it looks like they are the same thing, since there is a one-to-one relation between them, but it probably won’t stay this way.
  • The documentation doesn’t document much. There are fields such as client attributes with values that are not really explained anywhere.
  • The API is really weak. As I said, you can’t do much beyond enable/disable tasks in specific schedules

So far, it looks like this feature adds confusion but no value. I hope Oracle will do something fun with it in the future.

 

I Love NoCoug Training Days August 25, 2009

Filed under: advert, training — prodlife @ 2:08 am

Sometimes, life kind of loops on itself. A circle closes. You find yourself at the same spot you were two years ago, but from a completely different viewpoint.

Flashback two years and few month back: I’ve recently relocated to the US. I’m somewhat of an Oracle newbie, but I know that there are all kinds of cool DBA stuff going on, and I desperately want to be part of it. My colleagues tell me that HotSoS seminars are the best, so I asked my boss to send me to one. The request was left hanging in the air for weeks, just to be rejected. Because it was far away and I had to fly there and it was just too expensive. I was in tears.

And then I got this email from the local user group – NoCoug. They said they are doing a training day with Kyle Hailey. Kyle Hailey of Oak-Table fame was my hero at the time. I just finished reading Oracle Insights, and I was deeply impressed by his story of the program that could connect to the SGA directly through shared memory. I was all “Wow! Kyle Hailey! Only an hour drive away! And it costs just 250$! My boss will have to approve it! Hell, I’ll even pay for it myself!”.

And my boss did approve it. I went to the training day, and it was amazing. I learned more at that day than at the week long classes I took when I learned to be a DBA. What I learned then is still useful to me, almost every day on the job.

Obviously, I was deeply thankful to NoCoug for making it possible for me to attend this amazing event for a price my boss agreed to pay.

Back to present day: I am the training day coordinator for NoCoug. I want to create the same experience for every other DBA in our region. Top-notch training event at a price that won’t make your boss blink.

Everyone who knows me will laugh at the idea of me coordinating a training day. I usually can’t coordinate my own breakfast. But this is so important to me – every DBA should be able to be better at his job by learning from the best experts.

I also knew just who should lead my first training event – Tanel Poder is one of the best experts I know (Probably at the top 3 of my personal ranking), his blog and scripts and systematic troubleshooting ideas completely changed the way DBAs work. In a very good way.

And he agreed to give his famous “Advanced Oracle Troubleshooting” at North California, and we agreed on dates and prices, and I found a location. The impossible happened and I almost coordinated a training day.

Now I just need people to register so the event can really happen. I desperately want everyone to know about this event. I know it can improve the way people work so much its really a shame if someone will miss the opportunity. So even though I’m just a simple DBA and not a marketing expert, I’m going to do my best and annoy the hell out of everyone just to make sure that every single DBA in North California will know about this event. I even put a small funny looking ad in my blog.

If you can help me here by spreading the word to your NorCal friends – I’ll really appreciate the help.

P.S:
Advice on how to do non-annoying marketing for the event will also be appreciated. I know some of you have been promoting your own events for years. Please share your experience!

 

Preparing to Clone! August 17, 2009

Filed under: scripts — prodlife @ 10:54 pm

We are moving to this new data center. The new servers are in the new data centers, just waiting for Oracle to be installed on them. We have about 50 new servers to install.

Obviously we want to install them as quickly as possible, and taking as little human-work time as possible. To achieve this, we are checking to options – scripted silent installation and installation cloning.

I’m checking the installation cloning part, and I’m using Oracle’s Universal Installer User Guide for the process. One of the first steps in the process is:

At the source, you run a script called prepare_clone.pl.
This is a Perl script that prepares the source for cloning by recording the information that is required for cloning. This is generally located in the following location: $ORACLE_HOME/clone/bin/prepare_clone.pl.

When I tried to run the script, I found out that oracle user did not have execute permissions on the file. Then I found out that the script had the location of PERL hardcoded to the wrong location. Finally, I found out that the file had the usual amount of comments for an Oracle script, but only one line of code:
exit 0;

I found this incredibly amusing, so I decided to blog on this. While blogging, I took a closer look at the documentation, and found the following comment:

The need to perform the preparation phase depends on the Oracle product that you are installing. This script needs to be executed only for the Application Server Cloning. Database and CRS Oracle home Cloning does not need this.

I guess the joke is on me. Serves me right for not reading the instructions carefully.

 

You Can’t DeDupe Oracle DB Files August 14, 2009

Filed under: Storage — prodlife @ 1:57 am

One of our storage vendors has DeDupe technology. DeDupe is short for deduplication – the idea is that when you have identical blocks on the storage, it will only keep one copy of the block. This is a nice idea that saves on storage. Its especially good on shared filesystems where many users keep copies of identical files.

Our vendor loves DeDupe and managed to convince my storage manager that DeDupe will lead to amazing storage savings. Even on the DB volumes! No matter how much the DBAs protested that DBs rarely have full blocks that are identical to each other, the vendor kept insisting that many other customers have seen amazing storage savings using this technology on their data files. “Databases have many empty blocks”, the storage manager said after lunch with the vendor “And they are all identical! Think how much space you can save by keeping just one empty block!”.

We agreed to test DeDupe. As expected, we saw about  2% of space savings. Not exactly what the storage manager expected.

I wasn’t surprised. Even empty data blocks in Oracle DB files are not really identical. They have a header, which contains a relative address, which makes each empty block slightly different.

So, no DeDupe. Thought you may want to know, so you won’t have to repeat this experience. Maybe even send a link to your vendor :)

If your experience was different though, I’d love to know. The vendor insisted that he had many custormers happily deduping their databases.