Mad Troubleshooting Skillz!

Last week I attended Tanel Poder’s Advanced Oracle Troubleshooting seminar, organized by NoCOUG.

Well, actually it was organized by me, with lots of help from Iggy and the rest of NoCOUG. Organizing a seminar was not trivial, but wow – it was totally worth it!

When the seminar was over, we asked the attendees to fill a small survey and tell us what was good and what was bad. Turns out that there was just a single answer for “What was the best thing about the seminar”. The answer is – Tanel.

He is clearly an expert. He loves what he is doing. He can think on his feet and answer all sorts of more-or-less related questions from the audience. He is extremely generous with his time and his knowledge and his scripts (I know lots of DBAs who guard their scripts with their lives, Tanel happily shares them in his blog). Tanel is also funny, entertaining and the course is pretty well structured. He also talks very fast (and seems somewhat obsessed about not wasting a single millisecond) – you won’t believe how much you’ll learn in two days.

I had some trouble explaining explaining to my team and boss what I learned:
“Well, the first half day was troubleshooting hangs and slowdowns using v$views”
“But you knew that before!”
“We also learned lots of Oracle internals. How the shared pool really works, and how SQL is really processed. Lots of cool stuff.”
“OK, but what is it good for?”
“Knowing how stuff works is always good. Anyway, I also learned to use Unix tools to debug problems. Like dumping process stuck to see where it hangs! Also, we learned how to handle free memory issues. Remember that awful leak we had on that test server?”
“Oh! Look! Shiny scripts!”

But the proof of the pudding is in the eating. No one can argue the fact that last week I already managed to troubleshoot and solve two problems that other team members failed to make much progress on. I did it very quickly too.
Now here is the strange thing – the two problems were in areas of Oracle that Tanel very explicitly did not mention during the seminar. Streams and Clusterware. I did not even use any of his scripts to shoot them. And yet I’m still convinced that the reason I was so effective in solving those problems is directly related to the seminar. How is that? Here are the important non-technical things I learned at the seminar:

  1. Systematic approach – You don’t work off lists, you don’t waste time by looking at random places and you don’t guess (much). You gather symptoms, you use them to pinpoint the problem and you use the pinpointed knowledge to work your way toward a solution. The last part sometimes involves Oracle support. I knew about the systematic approach thing before, but two days of looking at someone demonstrating it makes a difference.
  2. Don’t believe anyone (except the OS) – Users lie, other DBAs lie, even Oracle sometimes lies. Always crosscheck and double check the facts. No one lies intentionally, but the result is still misleading.
  3. Problems have causes. I know it sounds funny, but very often we stop troubleshooting too soon, attributing a problem to mysterious unknown forces or at least say “well, I don’t know how to know this” and leave things at that. Tanel went farther than anyone I’ve ever seen by saying “I have to know why this behaves like that” and when Oracle doesn’t tell him, he goes to the OS, or the network, or writes his own tools. Thats a good lesson – don’t take no for an answer.
  4. All DBAs have tons of troubleshooting scripts. Real experts have scripts with very short names and very flexible arguments. They also have a script for reminding them how to use their scripts
  5. I no longer view trouble as something annoying that wastes my time and prevents me from doing stuff I want to do. Instead every trouble is now cherished as an opportunity to practice what I learned, learn more and polish my skills.

I highly recommend Tanel’s course to DBAs who want to suddenly become the best troubleshooters in their team. Its not a comfortable position to be in (suddenly a lot more trouble finds its way to you), but it can be lots of fun.


4 Comments on “Mad Troubleshooting Skillz!”

  1. karlarao says:

    …. seems like you are stocked 🙂
    and rev’d up for more troubleshooting…

    and yeah it will be more fun now.. 🙂

  2. > No one lies intentionally
    No, sure no one, we just hide the truth to save our heads :mrgreen:

    > I no longer view trouble as something annoying
    STRIP FOR Sep 24, 2009


    • prodlife says:

      You know, in all the time I’ve been doing IT – only once did I catch someone lying to save his head.

      Maybe I’m not good at catching people, or maybe I work with very honest people.

      • joel garry says:

        I think it must be a combination of not being good at catching people, and being lucky about being in situations where fessing up is considered to be a good thing.

        I haven’t been so lucky. I try to set things up for my local situation to have a good working relationship with people I directly interact with, but beyond that I’ve seen, um, lots of camping.

        Your #3 is the essential difference between (PHB style) management and technicians. If you set yourself up as an expert, people will figure out how to get around incorrect management evaluation of the value of expertise. Eventually, you may see the advantage of using management to meter excess usage. Always an unstable balance, but that’s life in dilbertland.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s