Hurray! Hurray! I just passed the OCA exam!
I’ll follow Don Seiler’s idea and try to help other DBAs by describing how I prepared for the exam.
- I was a production DBA for over three years prior to taking the exam. This experience was critical for passing the exam, because it is rather difficult to memorize 500 pages. Experience puts things in context, making them easier to remember. Also, you don’t have to memorize the things you do several times a week.
- I used only Oracle 10g OCP Certification All-in-One Exam Guide , I can’t say it is perfect, but it was good enough. One thing that really confused me is that the book gave a lot of attention to how to do tasks in the db console, it had tons of screenshots and explanations. I never used dbconsole in my life, and I assumed that being familiar with how to perform tasks in the dbconsole will be important part of the exam, so I spent tons of time studying these screenshots. Turned out it was completely useless.
- I studied for the exam for about three weeks. Reading about a chapter every day. I’m not sure it is a recommended method, by the time I finished reading the book and started the sample exams I discovered that I forgot much of the first chapters. Next time I may try taking a week off before the test and really immerse myself in the material.
- The weekend before the exam, I hiked up Mount Shasta. Probably not a good study method, but it was lots of fun and therefore highly recommended.
- The day before the exam a co-worker lent me his study book. I don’t remember which one it was, but I do remember that it was rather different than what I used. I was especially confused by the fact that his book claimed that Dictionary Cache is used to cache parsed SQL statement and is an optional structure and that you can’t install Oracle with OUI if you don’t export ORACLE_HOME and ORACLE_SID first. Now, I know Oracle’s memory structures inside out, but a day before the exam these kind of things can really shake your confidence. I recommend learning from more than one source, but I don’t recommend swapping the source one day before the exam.
- Read every question several times and then go back and review. During the review I found out that a question I first understood as “which tablespaces can’t be taken offline” was actually “which tablespaces can be taken offline”, there is no reason to lose points to something silly like this.
Good luck to all of you studying for the exam. I know there is a lot of controversy regarding the certifications – whether they are worth anything or mean anything. I’m pretty sure you can be a bad DBA, study for the exam and pass, but I also think that studying for the exam made me a slightly better DBA, because reviewing the basics is always important, because there were some tasks I always did in a certain way and while studying I learned there are other ways to do it, and because I never knew how to use profiles, do external authentication or connect with ezconnect before studying for this exam.
While I keep stressing the importance and value in automating and scheduling tasks such as backups, nightly reports and cleanups, I never really mentioned what a mess the entire system can become when the automation is not managed carefully.
It starts by the place where you keep the scripts – if you keep all the scripts you need on each machine, each machine will eventually run very different scripts. They will just naturally evolve in different directions – which is not fun to manage and maintain. If you have a central location where you keep all scripts and run them from there – eventually someone will make a change to a script that will work on some machines but will break on others. I recommend working with a central repository, preferably with version control.
Then there is the issue of who owns each script. Unless you believe in complete shared ownership, you will want to record who is responsible for each script and at least notify him when making changes.
There is also the issue of what each script is supposed to be doing. Documentation is extremely important, and good documentation starts with one or two lines at the top of the script explaining what this script is good for. We also document what are the common cases that make the script fail and how to resolve them.
Finally, you will want to record the priority of the script. We have two priorities – 911 and can wait. 911 is for scripts that when they fail you want to wake up the DBA. Backups are our only example for 911 script. Everything else can wait till morning. Before we documented this, we’d get phone calls at 2am about the daily AWR failing. Remember that operators always fail on the side of waking up the DBA, so be extra clear about when not to call you.
If you feel that you are drowning in scripts and you spend most of your day looking for scripts and fixing errors that should never have been there in the first place, you are probably not managing your scripts correctly. Even simple steps towards a more organized automation will significantly improve the situation.
It happens every three month – the project managers need to generate their quarterly report. Every three month they decide that the reports they used until now are not good enough and they come up with a new design for the report. The DBAs now need to get the data, preferably yesterday. After we gather the data and create the report, there is another week during which we try to convince them that the results are correct and they try to convince us to massage the data a bit to reflect the results they expected.
This time, to generate the report I need to collect data from six different databases which together contain about 30 schema, one of them is SQL Server. Sounds like fun already.
Naturally, I want to write a script that will run all the necessary queries, do all the necessary calculations and aggregations, and dump everything into a table. Then I’ll grab the results from that table and put them into a nice looking Excel (with graphs and colors!).
My only dilemma is: Perl or PL/SQL?
- I know Perl much better than PL/SQL, so I’ll get it written fast and more correctly.
- Perl has no problem connecting to SQL Server – I’m kind of worried it may be an issue with PL/SQL.
- If I do the report in Perl, I won’t have to create all those linked servers to all the other systems. Our security team dislikes linked servers.
- Well, learning PL/SQL was one of my objectives this year – so this is a great opportunity.
- It will probably run much faster. I can’t be sure about it at this stage, but this is usually true.
- Other DBAs can maintain PL/SQL code. I’m the only one who does Perl.
I think that if I’ll find a way to get the SQL Server data from PL/SQL, I’ll go the PL/SQL route. I think its called “heterogeneous server”, but last time I checked it was quite complicated to use. Maybe it is worth a second chance.
It is known that smart tools are sometimes used for purposes that their inventors never considered. In fact thats one of the reasons we consider these tools smart and well designed. I’m not sure if Codd considered today’s multiple terabyte monsters when he first thought about relational databases, but I am quite sure he never intended the relational database to be used as random string generators.
Storing passwords in the database is a common practice, and Tom Kyte gives a function that hashes the password so they can be kept in the database while keeping them relatively secret. Nice function, but of course thats for pre 8.1.6 databases. We don’t have many of these today, so this solution should be extinct by now. Right.
Years back we were using a very similar solution for keeping passwords in the database. The application originally using this solution is long gone now, but at some point another developer found our digest function and found a unique use for it:
When he creates a new user he will use our hash function to create a complicated looking string and give it to the user as a password. So user Fred got select digest(fred,welcome) from dual as his password. Of course now we can’t keep the hashed value in the database, because it is now the real password. No problem. The developer used a Java library to encrypt the hashed string and store it in the database.
You think this is silly? Imagine what I thought when the application owner of this application dropped by my cube and asked me to use the famous digest function to generate a password for a user he created on a windows machine. Yes, a good old windows user.
I was so amused by this that I didn’t even try to explain to him that this password is just as good as any string he will invent. People are so easily impressed by those horrible strings!
Jeff Hunter posted an amusing story about how he optimized a query and what the developer did with his suggestions. The story is so cool not because it is one of these horrible WTF stories that are unlikely to happen in any but the most moronic IT organizations, but because this is a rather harmless oddity that happens all the time.
Most developers started their career by studying CS. This means about three years of writing code in Pascal, C, Java or whatever language CS departments prefer these days. After three years of C development you think of every problem in terms of arrays and loops, after three years of Java you think about objects and patterns. You are not even aware that you are doing this, but everything you look at instantly becomes an array with a loop, or a virtual private class.
At least in my days, databases were a highly optional course in the CS department, so the new developer on the team is probably seeing SQL for the first time in her life, after years of converting problems to arrays and loops. Unfortunately, a database table looks exactly the same as an array. The developer’s automatic response is to write a loop that goes over the array. Imagine her horror when she discovers that SQL doesn’t do loops. With few exceptions, even after learning SQL, the developer will continue thinking of it in terms of loops and arrays. For the procedural developer, everything is a full table scan and nested joins.
At this point, his code will do funny things like fetch entire tables and iterate over the results to count the number of rows. Maybe not that funny, but close enough. In a good development organization, there will be code reviews and his manager or peers will turn him to the path of efficient database development.
In my experience, it is best that if at this fragile stage, no one will tell the developer about PL/SQL. The longer the developer has to work only with set operations within the database, the more he will get the ideas of database developement and the better DBA he will become. Developers need to be broken of their loop addiction early and brutally. If they find out about PL/SQL too soon, they will always have the crutch of going back to their loops and will never really get DBs, or at least the process will be much delayed.
DBAs working with development teams really need to take care in educating their developers. This is really preventing performance issues before they are even written, and it is much easier to solve problems at this stage.
Too often, when I search the net for some information I need, I find something completely unrelated that catches my interest. Normal people may bookmark the interesting article for later review and continue searching for the original piece of information. But I am not normal people, I have N.A.D.D – so I’ll read the interesting article, think of it for a bit, then write a blog entry about it, discuss it with friends, and few hours later remember that I still need to find that original bit.
This time, the interesting article is “Oracle DBA Checklist“. Its a nice list of things DBAs do daily, weekly, monthly, etc. At first glance it looked like the kind of thing that every DBA should have, and I was starting to wonder if I’m a bad DBA because I don’t have one.
At second glance, I figured out that I hate this list and everything it stands for. No DBA should have such list. The correct name of the list is “Monitors to set up when configuring Oracle system”.
the first problem is the entire idea of having the DBA go though a checklist every day, or even every week. This is an unbelievable waste of time. Everything that has to be done at such frequency should be automated during the first week on the job. While the list author does mention that “some sites may wish to automate this”, this is not good enough. It should be phrased as “Automate the entire checklist or find another job”.
The second problem is that of frequency. The suggest to check that the instances are up once a day. Don’t know about their implementation, but we can’t afford a day of downtime on our databases. Our monitors check for instance availability every 10 minutes. Verifying resources should also be done more frequently – Disk space problems can occur very fast and crash your instance, high CPU can cause severe performance problems. You want to catch these sooner rather than later.
however, they were few gems in that document – I loved the suggestion to read manuals for an hour a day. So I decided to write my own mini-checklist despite my hate for anything that involves routine and repetition:
- Go through mailbox and dashboard to verify that no alerts were sent during the night from our monitors and scheduled jobs.
- Read DBA manuals
- go over blog aggregator for interesting news and posts
- Review reports of CPU, Disk, network, memory and response times for trends, future scalability and ideas for improvement.
The rest of the day can be dedicated to the usual disasters, urgent requests and firefighting. Oh, and maybe some system improvements, in our spare time.
Every DBA should know something about the network (and OS, and storage, and application development – good DBAs are multi-disciplinary). Probably everyone knows how to debug connectivity issues with ping and trace route, how to diagnose problems with the DNS, and how clients are connecting to the listener.
DBAs with RAC need to know much more about their network – they have virtual ips and interconnect and usually a storage network too. RAC systems can crash due to network latency problems, and the DBA may need to diagnose this. RAC also has this entire load balancing and failover thing where network addresses can move around and you need LOCAL_LISTENER and REMOTE_LISTENER parameters configured.
However, it is very rare that things get as complicated as they did this week:
On Thursday we configured a server with LOCAL_LISTENER and REMOTE_LISTENER values.
Almost immediately a customer started experiencing occasional connection failures. Naturally, we (DBAs) didn’t hear about it until it was escalated to an emergency call at Saturday afternoon. I had a date on Saturday night and didn’t want to be late due to a long debugging session, so I rolled back the change, asked the customer to send me his tnsnames.ora, hosts file and screenshots with the failure, and told them I’ll look into the issue on Monday.
Monday morning arrived sooner than I would have preferred. Their tnsnames.ora that was supposed to connect to our database actually contained an address I did not recognize. A quick meeting with the local network manager, revealed that these guys have VPN, they connect through a NAT and they also have a special firewall configuration. Remember I said that every DBA should know networks? Well, I didn’t mean NATs and VPNs. So I don’t know enough about networks, and the network manager doesn’t understand listeners and RAC, but we had to find a solution together.
It appeared that after I configured LOCAL_LISTENER values, when the listener attempted to redirect their connection to another server, it sent them an address (or maybe IP?) that their client couldn’t connect to and therefore failed. But why did everything work before we configured the LOCAL_LISTENER? According to the documentation we would still send addresses the client can’t connect to, just the local addresses instead of the vip. The network administrator had a theory that maybe the NAT translated the local address as it was sent back to the client to something the client understands, but this is really far fetched.
This is where the sniffer comes into the picture. When you have a complex setup and you have to know exactly what is the behavior – who is initiating the connection, what is the reply, where are the redirects, who closes the connection and how. The sniffer will listen to the network and give you the complete picture.
I use Ethereal, which is graphical and relatively clear and friendly, but still very powerful.
In this case, Ethereal was especially useful – using the customer network capture, we could easily see that we were debugging the wrong issue all along. The tnsnames.ora file he sent us belonged to a different machine that didn’t experience any problem. The machine that experienced the issue connected to a different ip, which no one really knew about until this time. We are still not sure how it is related.
What we do know is that if you have a complicated network configuration, that changed several times, in very inconsistent ways and that no one documented – a sniffer is your only friend.
So, these days I’m studying for my Oracle certification. I’m studying with a friend, and we use a book from Oracle Press to prepare. We are both experienced DBAs, so we make a game of finding mistakes in the book. Its a fun game, and keeps us alert while going over rather boring material.
Yesterday, I’ve read that Oracle doesn’t allocate space for a column, even if it is fixed size, until there is data in it.
While this is certainly space efficient, it seemed like a very difficult way to manage space – you have to keep moving columns around when people update them. So, we suspected that the book is making a mistake, and that Oracle allocates the space for fixed size column when the row is inserted, even if the column is empty.
Time for verification:
create table t1
(X integer, Y integer)
I created an empty table, with two fixed size columns, and checked the size:
SQL> select owner, segment_name, bytes,blocks, extents from DBA_SEGMENTS where owner='CHEN'; OWNER SEGMENT_NAME BYTES BLOCKS EXTENTS ---------- ---------------- ---------- --------- ----------- CHEN T1 65536 8 1
Nothing surprising. So lets insert some data, but only to the first column. Keep the second empty:
begin for i in 1..30000 loop insert into t1 (X) values (i); end loop; end;
And check size again:
SQL> select owner, segment_name, bytes,blocks, extents from DBA_SEGMENTS where owner='CHEN'; OWNER SEGMENT_NAME BYTES BLOCKS EXTENTS ---------- ---------------- ---------- --------- ----------- CHEN T1 458752 56 7
Look, the table just got bigger.
Just for the heck of it, lets see what happens when I insert one row with both columns. Will this be enough to allocate trigger allocation in all rows?
insert into t1 values (1,1); SQL> select owner, segment_name, bytes,blocks, extents from DBA_SEGMENTS where owner='CHEN'; OWNER SEGMENT_NAME BYTES BLOCKS EXTENTS ---------- ---------------- ---------- --------- ----------- CHEN T1 458752 56 7
Nope. Nothing much changed.
So lets update all rows with value in the second column and see what happens:
update t1 set Y=1; SQL> select owner, segment_name, bytes,blocks, extents from DBA_SEGMENTS where owner='CHEN'; OWNER SEGMENT_NAME BYTES BLOCKS EXTENTS ---------- ---------------- ---------- --------- ----------- CHEN T1 851968 104 13
So, the table doubled. The book was correct – the second table was not allocated when I did the first insert, only after the update.
Since I assume that Oracle will now put one column of a row in one extent and the second in another extent, the update probably involved moving a bunch of rows around to the new extents. Something to keep in mind when trying to figure out why update is using to much IO.
Tonight, at midnight, inserts slowed to a crawl at one of our main production databases. This database is used to process about 300 inserts every second, but for about 20 minutes starting at midnight, each insert took almost a minute. The problem resolved by itself, but now the application owner wants answers – what caused the slowdown at midnight?
A co-worker ran AWR and found that the top wait event is Enq: TM-Contention. A lock! This looks promising, but inserts usually don’t lock. Thats one of the most important principles of Oracle – readers don’t block writers and writers don’t block readers. The most locking I’d expect from insert will be a latch contention on a hot block. But co-worker is a very experienced DBA and he knew that TM-Contention can happen from non-indexed foreign key, and he started looking for one.
At this point I arrived at the office, and he updated me on the situation.
I don’t like AWR at all, so I took out my favorite tool – Confio. This is a nice graphical tool that helps with wait event analysis, but you can probably get pretty much the same data by digging into v$active_session_history yourself.
Confio also showed that TM-Contention is the largest wait event, but it also showed the queries waiting (lock table … in exclusive mode) and the machines. The table locked and the machines had nothing to do with the insert issue. A completely different part of the application. So, we can skip this event. I don’t think you can get this kind of information from AWR!
The second and third highest events are “db file sequential read” and “gc current request”. These events are so typical of selects (usually with a bad index), that if I saw them in AWR I’d probably assume they are unrelated, but Confio showed, clear as a day, that all the queries with this wait are inserts. I’m not talking about a fancy “insert into … select”, I’m talking about a plain “insert into … values”.
I still don’t know why a plain insert had to wait on “db file sequential read”, and why it had to do it only at midnight and stopped that 20 minutes later. But at least now I know where to look!
Note how AWR not only lead us in the wrong direction, it also doesn’t give enough information to follow up on the issue – like queries, tables and machines involved.