Cloning Oracle Home from RAC to Stand-Alone

Posted: July 31, 2010 | Author: prodlife | Filed under: Linux, Oracle, rants | Leave a comment

This post originally appeared over at Pythian. There are also some very smart comments over there that you shouldn’t miss, go take a look!

This should have been the easiest task on my todo list: Install Oracle 10.2.0.3 EE standalone on a new Linux RHEL 5 server, later to be used as a standby for a production RAC system. This means 2 lines of “runinstall -silent …”, less than 5 minutes of DBA work and maybe 20 minutes of waiting. I did not expect to spend over 5 hours doing this.

Problems started when I discovered that I don’t have the 10.2.0.3 patchset and another patch that exists on production and should be installed on the standby. I had to wait for my Metalink credentials to be approved for this customer CSI before I could download the patches for them.

“Why don’t you just clone the software from production?” asked a helpful colleague.

Sounds like a great suggestion. I cloned Oracle software before and it is a simple process: tar $ORACLE_HOME, copy the tar file to the new server, untar, run the cloning script which will register the new home with the inventory, and you are done!

In theory, at least.

Here is what actually happened:

Tar, copy, untar, script
Ran OPatch to verify that the new oracle home is in the inventory and that I see the correct version and patches.
OPatch is showing two nodes. Oops. I didn’t realize oracle home has information about the cluster – didn’t Oracle move the inventory elsewhere? Spend an hour looking for the cause of this.
Found that the two nodes are mentioned in $ORACLE_HOME/inventory/ContentsXML/oraclehomeproperties.xml
Removed this file.
Deattached Oracle Home to clean inventory without deleting the software.
Ran the clone script again
Yay! OPatch looks good now.
Decided to create test database to be extra sure everything is fine
NETCA failed with linking error. Spent an hour figuring out why. Cursed a bit.
Had to install libc-devel, 32 bit version. Too bad RDA didn’t catch this.
Created test database, but SQLPLUS now fails with linking error. More cursing. Wondered what I did to deserve this.
libaio.so.1 was missing so I had to install the 64 bit version of libaio. Too bad RDA was silent about this as well.
Couldn’t start the database because the database couldn’t find the cluster. Why was it even looking for a cluster? Spent an hour figuring out why. Ah, because I copied the software from a RAC server and it was linked as RAC database.
Relinked everything with RAC_OFF option.
Finally things are working. Too bad it is 8pm already.

What I should have done: (I’m not sure if it is supported by Oracle, but at least it works)

Double check that we have all RPMs.
Tar, copy, untar
remove $ORACLE_HOME/inventory/ContentsXML/oraclehomeproperties.xml
run clone.pl: clone/bin/clone.pl ORACLE_HOME=/appl/oracle/product/10.2.0/db_1 ORACLE_HOME_NAME=OraDb10g_home1
Relink as non-RAC: make -f ins_rdbms.mk rac_off
Verify with OPatch.
Create test DB:
netca /silent /responsefile ~/netca.rsp
dbca -silent -createDatabase -templateName General_Purpose.dbc -gdbName TST -sid TST -SysPassword xxx -SystemPassword xxxx -emConfiguration NONE -datafileDestination /appl/oracle/oradata -storageType FS -characterSet WE8ISO8859P1 -nationalcharacterSet AL16UTF16 -memoryPercentage 40
Go for a nice afternoon ride.

I hope that I’m not the only DBA who always have to find the most difficult way to accomplish a task, and that this post will be useful to others. Perhaps the best piece of advice I can offer is to avoid this type of cloning in the first place.

Lessons From OOW09 #1 – Shell Script Tips

Posted: October 17, 2009 | Author: prodlife | Filed under: Linux, openworld09, scripts, tips, Uncategorized | 15 Comments

During OpenWorld I went to a session about shell scripting. The speaker, Ray Smith, was excellent. Clear, got the pace right, educating and entertaining.

His presentation was based on the book “The Art of Unix Programming” by one Eric Raymond. He recommended reading it, and I may end up doing that.

The idea is that shell scripts should obey two important rules:

Shell scripts must work
Shell scripts must keep working (even when Oracle takes BDUMP away).

Hard to object to that 🙂

Here’s some of his advice on how to achieve these goals (He had many more tips, these are just the ones I found non-trivial and potentially useful. My comments in italics.)

Document dead ends, the things you tried and did not work, so that the next person to maintain the code won’t try them again.
Document the script purpose in the script header, as well as the input arguments
Be kind – try to make the script easy to read. Use indentation. Its 2009, I’m horrified that “please indent” is still a relevant tip.
Clean up temporary files you will use before trying to use them:
function CleanUpFiles { [ $LOGFILE ] && rm -rf ${LOGFILE} [ $SPOOLFILE ] && rm -rf ${SPOOLFILE} }
Revisit old scripts. Even if they work. Technology changes. This one is very controversial – do we really need to keep chasing the latest technology?
Be nice to the users by working with them – verify before taking actions and keep user informed of what the script is doing at any time. OPatch is a great example.
Error messages should explain errors and advise how to fix them
Same script can work interactively or in cron by using: if [ tty -s ] …
When sending email notifying of success or failure, be complete. Say which host, which job, what happened, how to troubleshoot, when is the next run (or what is the schedule).
Dialog/Zenity – tools that let you easily create cool dialog screens
Never hardcode passwords, hostname, DB name, path. Use ORATAB, command line arguments or parameter files.I felt like clapping here. This is so obvious, yet we are now running a major project to modify all scripts to be like that.
Be consistent – try to use same scripts whenever possible and limit editing permissions
Use version control for your scripts. Getting our team to use version control was one of my major projects this year.
Subversion has HTTP access, so the internal knowledge base can point at the scripts. Wish I knew that last year.
Use deployment control tool like CFEngine. I should definitely check this one out.
Use getopts for parameters. Getopts looked to complicated when I first checked it out, but I should give it another try.
Create everything you need every time you need it. Don’t fail just because a directory does not exist. Output what you just did.
You can have common data files with things like hosts list or DB lists that are collected automatically on regular basis and that you can then reference in your scripts.
You can put comments and descriptions in ORATAB

Group By in Shell

Posted: April 22, 2009 | Author: prodlife | Filed under: Linux, scripts, tips | 19 Comments

Storage manager dropped by my cube to say that redo log archives for one of our DBs grew from 40G to 200G in the last day. Running “du -Sh” few times showed that the files are no longer getting written as fast, so the issue is probably over. But what was it? And to start, when was it? Once I figure out when the issue occure, I may be able to find the offending DML in ASH.

So, I need to do something like:

select day,hour,sum(bytes)
from (ls -l)
group by day,hour
order by day,hour

I guess I could have used the output of “ls -l” as an external table and actually run this query. But I’m actually trying to improve my shell scripting skills. There must be some way to do it!

I asked on Twitter, and Kris Rice (of SQLDeveloper fame) pointed me in the right direction – use AWK.

Here’s the result:

ls -l | sed ‘s/:/ /g’ | awk ‘{sizes[$7$8] += int($5)} END {for (hour in sizes)
{ printf(“At %s %d bytes were inserted\n”,hour,sizes[hour])}}’ | sort

I run ls. Use sed to seperate the time into fields (Laurent Schneider will kill me, because I’m mixing sed and awk, but I’m not it his level yet…). After the split, the fifth field contains the file size, the seventh contains the date and the eighth is the hour. So I have an associative array called “sizes”, indexed on the date and hour, and each bucket contains the sum of the file sizes in this hour.

Then I loop on the array and print the results, and sorted them to make it more readable.

A long post for a short script, but I want to make sure I don’t forget it 😉

Anyway, the best part by far was everyone who tried to help my desperate call in Twitter/Facebook. It was unexpected and very very nice. About 5 people sent tips, ideas and even solutions! I only imagine that doing my work is more fun than doing their own work, but I appreciate the help a lot.

Unusual IO activity on shared clusterware home

Posted: November 15, 2008 | Author: prodlife | Filed under: hardcore, Linux, nerdism, netapp, nfs | 6 Comments

Sometimes problem exist in a system for years, but only become apparent when you prepare for a big change. This war story begins when our storage admin decided to replace our Netapp disks with new disks, twice as large. It is a cheap way to increase disk space and IO wait times.

While assessing the impact of this change, he found out that the volumes where we put shared oracle home for our RAC clusters have 6000 IO operations per second (IOPS). The data and redo volumes never exceeded 2000 IOPS, so 6000 is quite significant, especially on disks that should be practically idle.

First debug showed that almost all the IO was neither read nor write, but things like “get attribute” and “access”. At this point I discovered that there is almost no way to get any information about IO activity on NFS mounts. I could not see which processes do this activity, nor on which files or directories it was done.

Time to get advice from the experts on Oracle-L. Vasu Balla of Pythian provided the solution:

“Oracle recommends using noac or
actime=o options when mounting nfs for Datafiles, Voting Disk and OCR. Noac
means “no attribute cache” means none of the file attributes are cached in
the filesystem cache, which is very much needed for RAC. If you put your
shared oracle home also in that mountpoint which is mounted noac, every
access to a file in the oracle home requires a physical IO at the netapp. So
I recommend moving all software directories ( db oracle home, asm oracle
home and crs oracle home etc ) to a nfs mount which is not mounted with noac
or actime=o.”

What a wonderful explanation. I now understand the issue and know what to do to solve it. I took me about 3 minutes to test this solution on our staging environment, and it worked like charm.

Unfortunately, both Netapp and Oracle insisted that shared oracle home on Netapp must be mounted with actimeo=0, and that if this is causing me trouble, I should move to local home instead of shared. Only after very long discussions with two experts from Oracle I got a non-official confirmation that the official documentation is probably wrong and that mounting oracle home with actimeo=0 is a bad idea.

To my surprise, my boss agreed to go ahead with the unofficial but working solution and change NFS mounts to remove “actimeo=0”.

So, we schedule downtime on our production RACs, and we change the mount options, and… Nothing happens. At all. 6000 IOPS before and after the change. If I wasn’t so shocked, I might have noticed my professional credibility taking a hit there.

Why didn’t it work on production? For weeks I had no answer. Until our network admin mentioned that I could use rpcdebug to get more insight about the issue. Turns out that NFS is RPC, and that Linux has flags for debugging RPC. By throwing magic numbers into /proc/sys/sunrpc/nfs_debug I could get NFS trace messages throwin into /var/log/messages. Now we are getting somewhere.

Except that it didn’t get me very far. I could see which devices NFS access, but I already knew that. I could see that our prod server had many many calls to “getattr”, while our staging system didn’t. To complete my tests I decided to turn off the attribute caching on staging again and compare the logs. Just to see what it looks like when both systems are in the same state.

Strange difference caught my eye: The staging systems had messages saying “NFS: Refresh_inode” which did not exist in production. Tiny difference, but maybe it has an impact? What does refresh inode mean? Time to go to lxr.linux.no and look at the Linux kernel code for clues. I just need to recall which version to look at.

When the lightbulb went off it nearly blinded me. Staging system has Linux 2.4.27, production is running 2.6.9. I was the one who pushed for the upgrade. I said “There are many NFS improvements in the new kernel versions.”

From here it was easy to find the change. In 2.4 the code for getting file attributes from the server looked like this:

 static inline int
 nfs_revalidate_inode(struct nfs_server *server, struct inode *inode)
 {
         if (time_before(jiffies, NFS_READTIME(inode)+NFS_ATTRTIMEO(inode)))
                return NFS_STALE(inode) ? -ESTALE : 0;
         return __nfs_revalidate_inode(server, inode);
 }

Which basically means – get the new attributes if the cache has timed out.

In 2.6 the code changed and the following check was added:

/* We may force a getattr if the user cares about atime */
       if (need_atime)
                err = __nfs_revalidate_inode(NFS_SERVER(inode), inode);
        else
                err = nfs_revalidate_inode(NFS_SERVER(inode), inode);

Which means that if the user needs to know the last time the attribute changed, we skip the cache time check and force a get attribute from the server. Another IO operations. Even if the cache did not time out.

Luckily, the fix is also easy. Just add “noatime” to the nfs mount, to let the kernel know that we don’t care about the last time attributes changed, and therefore it can go back and use the cache.

So easy once you know what to look for!

Yet Another Nice Myth

Posted: April 9, 2008 | Author: prodlife | Filed under: Linux, nerdism | 13 Comments

A while ago I explained Nice, what it can do and what it can’t. From this point, whenever someone misunderstood nice, I just pointed at my post.

But yesterday I ran into a misunderstanding that was not covered in the first post:

Boss: Server X is at 100% cpu. Can you renice process 4096?
Me: Sure thing boss. *clickety-click* Done
Boss: Why is cpu still at 100%?

Renice will not lower CPU utilization because:
1. It does not kill processes. After renicing the process will have lower priority, but it will still need the same amount of CPU it did before, and so do all the other processes in the system. At any given moment, our nice process will either wait for CPU or use CPU. Either way, we are still at 100% CPU.
2. You can’t lower the priority to a point that the CPU will prefer to sit idle instead of running that process. Which is exactly what you ask for if you expect renice to affect CPU utilization.

The Linux OS will not let the CPU run on IDLE if someone wants it. It makes absolutely no sense to make processes wait while the CPU is not doing anything.

Boss: Sysadmin types always say that 100% CPU is not necessarily a bad thing, but we are DBAs and we know that if a machine is at 100% CPU it means trouble.

I hope we can all agree that what we want to prevent is processes having to wait for CPU. Ideally, every process that needs CPU will get it immediately. In this sense, 100% CPU is indeed a bad thing, because either processes are already waiting for CPU (look at load average!) or it can happen any second. But it is not because 100% cpu is the problem, it is because waiting processes are the problem.

Suppose you have a classic OLTP system – lots of tiny, fast, transactions. Usually the system runs at around 60% CPU on its single CPU server, but this morning someone ran a huge cpu-intensive report on this system. You know that this report runs for 60 minutes on an empty system. He begs you not to kill the process, but he doesn’t mind waiting. Now you have two options:

1. Let this process run as usual. Since it is a large report, it could be scheduled for more than its fair share of CPU. Taking maybe 80% of cpu time, leaving 20% for our tiny transactions. Process will finish in 75 minutes. On average each tiny transaction will take three times longer than usual. 75 of 100% cpu, bad performance and complaints.
2. Renice the process. Now it will get 40% of CPU time. Tiny transactions get 60% (which is all they need). Report takes 150 minutes, tiny transactions take as usual. 150 minutes of 100% cpu, no performance issues.

Obviously the second scenario is much better. Even if you spend twice as much time in 100% CPU. So maybe 100% cpu is not always evil, and renice is useful even when it doesn’t lower the cpu utilization.

Boss: But look at Top! Things look terrible! How do we know that everything is OK?

It is important to remember that viewing the CPU world through “top” tends to skew our perspective a bit. Top gives CPU utilization aggregated according to its refresh rate – 3 seconds by default. Now during these 3 seconds, 1.2 seconds were used by huge report (40%!) and 2.8 seconds used by 20 processes pooling all those tiny 2 ms transactions. Each process will show close to 0.05% CPU. It looks scary – big bad report taking so much CPU leaving almost nothing for other processes. But each time a transaction wanted to run it got 100% of the CPU. It just finished very very fast, and the big report stayed.

How do we REALLY know that everything is OK?
I trust Oracle on this and go to Active Session History tables (or EM report), find the transaction processing sessions, check out the wait events and make sure CPU is reasonable.

Boss: But what do we do about CPU?

So, 100% CPU is not good, but we can’t use renice to solve this. What can we do?
Killing the offending process is a crude but effective solution. Rumor says SQL Profiles that will allow you to alter the execution plan of the query, maybe making it easier on the CPU. I never tried it. Of course, the ultimate solution is to optimize the report to use less CPU, and in my experience it is easier to convince developers to optimize after you killed the report several times. In worst case, buying more CPU power is always an option.

Nice Processes Finish Last

Posted: January 3, 2008 | Author: prodlife | Filed under: hardcore, Linux | 3 Comments

I’ve ran into lots of myths and misconceptions about the Linux “nice” command. Lots of people seem to view it as a magic wand that can be waved at misbehaving processes – they are not sure what it really does but maybe it will make the process nicer?

Indeed there is so much confusion around it, that it is incredibly difficult to find good information about it. Especially since the behavior of nice changes between kernel releases and information that was great two years ago, may not be true today.

Fortunately, Linux is open source and one can always read the source code of the relevant kernel version to know exactly what Nice is doing today. I mean it – the relevant code is all in one file (kernel/sched.c) and is very readable.

Since I’ve already did some reading on the 2.4.31 kernel, I’ll summarize here the behavior of nice in that version and then shoot down specific myths. To explain what Nice does I’ll have to start by a short overview of how Linux Scheduler works. Should be entirely painless and it makes things much clearer later on.

The important part of the scheduler is the schedule() function. This runs every once in a while (I’m not getting into that, except to say that since the scheduler takes CPU, we definitely don’t want it to run too often!), stops the process that is taking CPU now and gives the CPU to a new process. The main question is – which process is going to go next. That’s where the scheduling algorithm comes in.

Scheduling algorithm in Linux works in “epochs” – when the epoch starts, each process in the run queue is allocated certain number of CPU ticks that it can use this epoch. When a process has used all its ticks it needs to give the CPU to another process. If it waits for IO, it doesn’t use any ticks and another process can run at that time, but once it finishes waiting it can use the remaining ticks during the same epoch. When all the processes use all their ticks, the epoch is over and everyone gets more ticks.

So, which process runs next? The algorithm is implemented in goodness() function, which returns a number for each process. The higher the goodness of the process, the more likely it is to run next. Goodness is highest for special real time processes, which always run first. Higher for the process currently using the CPU (because it has things on cache and you don’t want to mess up the cache), high for processes with many ticks left as opposed to those with only few ticks, and 0 for processes that are out of ticks.

Now we get to Nice. Nice impacts the number of ticks a process gets from the CPU. We assign ticks in batches that amount to 50ms, and therefore contain different amounts of ticks for different CPUs, but the basic formula in 2.4 is this:

#define NICE_TO_TICKS(nice)     (TICK_SCALE(20-(nice))+1

TICK_SCALE is where we match the number of ticks to the CPU speed, but as you can see there is an inverse relation between nice value and the number of ticks, where nice 20 gets you the least ticks and -20 gets you the most, and it is a linear relation – when you move from nice 10 to nice 11 it is the same as moving from -5 to -4. This is no longer true in 2.6 where the slope is different for positive and negative values.

To summarize – the nicer you are, the less CPU you get and the less often you will be scheduled. Nice processes will finish last.

What Nice does not do:

Many people (including Linux kernel experts!) believe that “nice 20” means your process will only run if no one else wants the CPU. Not exactly. Every process has to use up all their tick counts before the scheduler starts a new epoch (unless the process is sleeping), so while your nice process will probably run last at each epoch, other processes will still wait for it to finish before the new epoch starts and they get a new tick quota.
Nice will not have much effect on a low utilization system. Your CPU is at 0% now. You start a heavy process (maybe export) with nice 20, what will happen? It will take its usual 70% CPU. Because as long as there are no queues this nice process will be rescheduled again and again. What’s the rest of the CPU doing? running the scheduler, which is now working much harder since it has to reschedule this process again and again with a tiny quota each time.
Top shows 100% cpu utilization, and you see that 80% of it is taken by a single process. Should you nice it? Not always. Top will show IOWait time as part of the CPU time, but processes don’t use ticks while they wait for IO, so if the process is IO bound, it will show a very similar behavior regardless of nice. The behavior is to use very little CPU, wait long time for IO, and show on Top as a very busy process.
Export takes tons of CPU and I don’t want other processes to wait for it, I don’t mind if export takes a bit longer. Should I nice? Probably not. For one thing, export is IO bound, which makes it insensitive to nice anyway. What is more important – export doesn’t do most of the work. What is doing the work is Oracle’s server process that is opened to work with export, and you actually have to “renice” the oracle child process if you hope for anything at all to happen. What you can nice with pretty good effect it the “zip” process that you run after the export.
And if we mentioned the child processes, keep in mind that when a process forks the child process gets the same “nice” value as the parent, but from that point on you can renice each of them independently.

How high can you go?

Posted: July 6, 2007 | Author: prodlife | Filed under: Linux | 10 Comments

What’s the highest load average you’ve seen on a machine?

Our production databases are running on boxes with two dual core cpus, so we usually have load average of around 3. When someone is doing heavy processing on a machine, the load average goes up to 10 by the time one of the DBAs intervenes and stops the madness. When we had a huge problem tuning the memory for client processes and the machine was swapping like there is no tomorrow, we had a load average of 25 and that was the highest I’ve seen a machine go. Needless to say, at that moment response times were less than inspiring.

You can imagine my surprise when I got a phone call one Sunday morning from the application owner who said “Looks like one of our DBs has load average of 500. It could be a problem with the monitor, but can you take a look?”

Sure I can take a look, I’ve nothing better to do on Sunday mornings.

I took a look, and saw the following:
[oracle@productiondb]# uptime 11:15:58 up 162 days, 15:58, 2 users, load average: 546.06, 538.21, 522.55

Not a monitor bug then. But the machine seemed surprisingly responsive for the extravagant load average. I connected to the DB with no delays, ran few queries which returned within a reasonable time. Everything seemed fine except for the high load average numbers.

And the numbers were going up. 30 minutes later we were well over 600.

I tried running “top” – which promptly hanged; “ps” hangs too; “df” works and now I know it is not a disk space problem nor an NFS issue.

I peaked in the “/proc” directory and saw that many of the stuck processes are actually “ps” and “bash”. So the hanging ps is probably causing the high load average, not a result of the load average. We have 500 ps processes hanging there, which inflates the load average while letting the database function as usual. More and more ps processes are accumulating because we use ps in most of our automated monitors. Vmstat shows 50% iowait, but thats obviously related to the stuck processes because the DB shows no more than the usual io waits. So ps hangs while waiting for io? Interesting.

Why does ps hangs? Strace reveals the problem:
stat("/proc/538", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0 open("/proc/538/stat", O_RDONLY)= 6 read(6,<unfinished ...>

So ps gets stuck on process 538. I wonder what’s this process is doing. Of course, I can’t use ps, so lets go straight into the proc directory. Except that cd /proc/538 got stuck too. No way to know what it is then. Can I kill it? kill -9 538 doesn’t hang, but doesn’t do anything either. Only a reboot will clear it.

Now is the interesting part: Should we reboot a perfectly working database just because the OS is doing something interesting? Time for management decision. I escalate to my manager, who escalates to his, who calls our system administrator and the application owner.

Meanwhile the system administrator reveals more interesting information: Few hours earlier the DB moved to a new Netapp cluster. Is this the cause of the strange behavior? How can a move to a new storage can corrupt the process memory blocks so throughly? Well, Redhat will need to answer this question.

3 hours later I get the decision – reboot the server. We took a netdump of the machine and restarted it. 5 minutes late the DB is up and running again with the normal load average and ps works.

We sent the netdump to Redhat. It is now almost a month since the incident and there is still no word on what happened to process 538 and whether we can blame the new Netapp.

5 Unix tools that DBAs don’t use enough

Posted: June 27, 2007 | Author: prodlife | Filed under: Linux | 6 Comments

Oracle runs on an operating system, very often this operating is a variant of Unix. I’m pointing out the obvious because I’ve worked with many excellent DBAs who were not aware of this fact. They were masters of using everything that Oracle provided, usually in the form of V$ and even X$ views, but completely ignored the fact that Oracle runs on Unix, and Unix can also supply lots of information that can be used to monitor, diagnose issues or automate tasks.

Here are few of my favorites, at no particular order:

1) sar and vmstat – reporting current and historical system informatio, including CPU utilization, IO, memory utilization, time spent looking for memory, load average, etc. I begin almost every performance diagnosis session by checking what these tools can tell me. They never tell the whole story, but they give good hints.

2) strace – This gives a system-call level trace of what a process is doing. You usually want to use it when a process hangs and you want to know exactly what it was doing when it hanged. The output looks like this:
open("/proc/25062/environ", O_RDONLY) = -1 EACCES (Permission denied) stat("/proc/27800", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0 open("/proc/27800/stat", O_RDONLY) = 6 read(6, "27800 (pickup) S 1534 1534 1534 "..., 1023) = 224 close(6)
Think twice before running this on a busy production system, as this baby will slow the traced process to a crawl and generate tons of IO.

3) gdb -c – run this when you find a core file in an Oracle directory and you want to know who crashed.

4) find – I’ll need a separate post about this wonderful tool. My favorite usage example is for deleting old user traces that usually just take up space. Lets delete those that are over two weeks old:
find /orahome/ora6410g/admin/ITGST/udump/ -name "*.trc" -mtime +14 -exec rm {} \;
This is just an example, find can do much more useful tricks. Its one of those utilities I’ll take when I have to work as a DBA on a deserted island.

5) Perl – Thats the other tool to take to a deserted island. Its a bit large to be called a tool, its an operating system with programing language thrown in. I use it for everything – our backup procedure is written in perl, we automatically generate reports for our customers from perl, disk space usage projections, etc. I’d write a post about this too, but it’ll probably grow to be a book.

60 seconds about ps

Posted: June 24, 2007 | Author: prodlife | Filed under: Linux | Leave a comment

If you use Unix, you’ve probably seen ps once or twice. Its one of the most useful tools in Unix. I use it to check if Oracle is up or not (by looking for smon process), check if a process I killed is dead or not, look for the parent of a process that is taking too much CPU, etc, etc.

I always knew that this tool has more options that just the good old ps -ef, but only today I found the time to really read the man page and see what they do. I found some cool stuff I want to share.

First, the selectors – you can ask ps to show just the processes that belong to a specific user (say, oracle) by running “ps -u oracle” will show all processes that belong to oracle.

There is also an option to select by command name, but it doesn’t work as expected, the command name is not what ps itself will show in the CMD column. So ps -fC oracle will show ora_smon_SID while ps -fC ora_smon_SID will show nothing.

Then, you can ask ps to show you certain columns:
ps -eo pid,ppid,pcpu,pmem,etime,time,comm,args will show you the process id, parent process if, cpu utilization, memory utilization, elapsed time, cpu time, command line and arguments.

And the last cool feature is sorting: you can sort by start time, user time, system time, page fault, user id and other useful parameters.
ps -ef –sort utime will show all the processes sorted by user time.

And combining everything I learned today, I can run:
ps -C oracle –sort cutime -o pid,ppid,pcpu,pmem,etime,time,comm,args
and find out that the oracle process that is taking most user time is ora_lms0_SID, but thats probably because I’m checking this on a rather unloaded system.

Key information is missing from the ps man page – it does not contain the list of parameters that –sort and -o (list of columns) will take. I found the lists here: http://www.ss64.com/bash/ps_keywords.html but I have to assume they exist on some man page, otherwise it is huge neglect by Unix documentation.

Just a simple Hadoop DBA

Adventures with Data and Massively Parallel Databases