Logs and other Instrumentation tools

Attention All Application Developers:

One day, your application is going to be deployed in production.
We both know, that even when that fateful day arrives, the application may still be full of bugs and perhaps even a performance bottleneck or two.
And sooner or later the operations team that owns the production system will notice the bugs and bottlenecks, and they will call you. They will expect you to be able to diagnose the cause of the problem and depending on the severity of the issue either suggest a work-around, issue an emergency fix or implement a solution in the next version.
They will also expect you to do all that without interfering with production work. This means that they will not allow you to upload new executables onto production. They are not doing this to make your life miserable, they are doing this because they are getting payed to protect the production system from interruptions.

If you don’t want to look like an idiot at that point, you should prepare your application for live debugging.
A log with error messages that you can understand is a terrific start. It is even better when log messages are dated. If your application is multi threading, you want to know which thread wrote which message to the log.
A nice switch that causes the application to start writing incredibly detailed messages to the log, usually known as “debug” or “trace” switch, will be highly useful and well worth the 2 hours it may take you to write one.  Make sure your application polls this switch – the operation people will be unhappy if you tell them you need to restart production to generate a trace.
After this, the sky is the limit – write key performance events to a table, allow attaching a client that will let you look at the memory of the application while it is running, secret APIs for turning on even more detailed debugging messages – I’ve seen them all, and they are all very useful.

OracleNerd gets it.


8 Comments on “Logs and other Instrumentation tools”

  1. starprogrammer says:

    Added bonus: it really helps if the log is relatively easy to parse. This is because a good debugging log will contain quite a lot of generally useful information, out of which only a small part will be useful at the moment you need it.

    Nothing really fancy, but if you can grep the interesting lines – you just saved yourself additional debugging time.

  2. Arul says:

    There are open source tools already available to do this, log4j, log4plsql…I’ve used log4plsql in the past and found it helpful.

  3. qawishte says:

    Yeah. It is good…

  4. chet says:

    Thanks for picking that up!

    I wrote another one of a similar nature: Instrumentation: DEBUG/LOGGING

  5. prodlife says:

    Hi Arul,

    Thanks for pointing out log4plsql. I should try it.

  6. prodlife says:

    Hi Chet,

    I like your blog in general and your instrumentation series in particular. I tried convincing few developers to read your blog, because I found it very inspiring (the entire approach of always striving to find the best way to do something), hopefully it will inspire them into taking the database more seriously.

  7. chet says:

    Thank you. I do appreciate your thoughts.

    I still try to get my colleagues to treat it more seriously (the architects no less)! I don’t know if there is a magic pill or something…that’d be nice wouldn’t it?


  8. moshez says:

    Shout it from the mountain tops, sister!

    Programmers are usually careful people. Happily, all programmers I work with understand the utility of logging. We write extremely detailed logs, and of course it’s not always enough — but we often have enough data in the system that we don’t need to call for reproductions from the field, and we can understand what went wrong. The APIs for managing debug levels are clumsy, but at least they exist.

    Then, there’s marketing. Which tries to pressure me to reduce space requirements for logs. Because “we’ll be deployed virtualized, and people don’t have that many GBs for logs.” I tried to carefully explain that we have bugs, we have problems, the product is still having the kinks worked out and when customers have problems — we need the goddamn logs to fix them. I am happy and proud that our programming team has achieved good turnaround on field bugs — I intend to work hard to keep it that way (logs aren’t the only part of the reason of course — but they are a non-trivial part of it).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s