Once every few weeks this customer encountered an ORA-1031 error – insufficient privileges. We saw the error in the alert log, and the customer complained about getting the error, but for some reason we never had enough information on what the customer was actually doing when the error occurred. We needed to know what he was doing to figure out how we can fix the issue.
We tried tracing, but the issue occurred so rarely that we were never able to actually capture it on time.
I thought we can solve the issue with error triggers. My suggestion was rejected by my team lead who prefers not to use triggers on production, but I still think it could be useful.
create table caught_errors ( dt date, username varchar2( 30), -- value from ora_login_user msg varchar2(512), stmt varchar2(512) );
CREATE TRIGGER log_errors AFTER SERVERERROR ON DATABASE declare sql_text ora_name_list_t; msg_ varchar2(2000) := null; stmt_ varchar2(2000) := null; BEGIN IF (IS_SERVERERROR (1031)) THEN for depth in 1 .. ora_server_error_depth loop msg_ := msg_ || ora_server_error_msg(depth); end loop;
for i in 1 .. ora_sql_txt(sql_text) loop stmt_ := stmt_ || sql_text(i); end loop;
insert into caught_errors (dt , username ,msg ,stmt ) values (sysdate, ora_login_user,msg_,stmt_); END IF; END;
This is a rather simple script, but I use it so often that it must be good for something.
We don’t use auto-extending tablespaces, because using auto-extending tablespaces puts you at a risk of running out of diskspace. When you run out of diskspace for data files, the database will have a nasty crash and you’ll probably need to recover it. If you run out of tablespace, on the other hand, you just get some errors. We did the math and put some strict limits on the tablespace. This means that when a customer decides to add a significant amount of data without telling us first, we get alerts that we are running out of table space.
We check that we have disk space to grow and at this point we do the following:
1. Using this query, I find the tablespace that is running out of space:
group by TABLESPACE_NAME
sum(BYTES) BYTES ,
group by TABLESPACE_NAME
order by ((a.BYTES-b.BYTES)/a.BYTES) desc
2. Lets assume that TBS_009 is out of space. Using this query, I decide if I want to grow one of the existing data files in the table space, or add a new one:
select * from dba_data_files where tablespace_name= 'TBS_009'
3. If I want to grow an existing data file, I run:
ALTER DATABASE DATAFILE '/voloradata/ora6410g/oradata/dev/tbs_009_01.dbf'
And if I want to add a new one, I run:
alter tablespace TBS_009
add datafile '/voloradata/ora6410g/oradata/dev/tbs_009_02.dbf' size 4M autoextend off;
Tip: Even if you want to create a data file that has 10G space, start by creating a very small data file, see that it is created successfully and then grow it to the size you really need. We had a client that crushed while we were creating a large data file on 10.1.0.3, and it ended up creating a corrupted file, which we had to offline and drop. Oracle confirmed that this is indeed a bug, so try to avoid creating large files.
Few weeks ago, my manager discovered that our training budget disappeared. We are still not sure where it went, but the fact was that our DBA team could not take any course this year. Quite disappointing, as we planned going to HotSos seminar and take a RAC course. The good news was that we couldn’t afford Oracle Week either, and I hate attending this marketing event.
Upper Management thinks that training for DBAs is a waste of both time and money. They think we can learn everything we need from books and online, so when the budget is in trouble, the training is the first thing they cut out. Not very smart. A bit like eating your seeds instead of planting them. I’ve been trying to explain the added value of seminars and courses. The intensity of the learning experience, the value of interacting with other DBAs and hearing about their issues and ideas. I really believe that the value we bring back from HotSoS seminar is well worth the price. He-who-controls-the-budget remained unconvinced.
I was all but resigned to have a boring year without any courses or seminars, when I received news that NoCoug – North California Oracle User Group is having a performance training day with Kyle Haily. The best part is that this training day, which sounds very interesting and useful, costs only 250$. At this rate, I was happy to pay for it out of my own pocket, but my manager came through and got me reimbursed for it anyway.
So now I have a good training day to look forward to. Thank you NoCoug and Kyle Haily. I hope you’ll do this more often.
I didn’t install Oracle 11g. Augusto Bott over at Pythian group installed 11g on Ubuntu, which is really cool considering how Ubuntu is not officially supported and all. What’s the challenge in installing Oracle on a supported platform?
Apparently, the challenge is finding Oracle 11g to install. I’ve no idea how Augusto got his copy. I guess you have to have an insider access to Oracle for it, because I’ve searched OTN today and all I found was a bunch of whitepapers. Fascinating reading, but not what I wanted. I thought 11g was released already, but it doesn’t seem to be fully released.
So, here I am, itching to play with the new toy, and all I can do is read articles about how the other kids are having tons of fun with it.
If any of my kind readers happened to be an Oracle insider and know how to get a copy, please feel free to drop me a line, or better yet – a CD!
Over at the SQL Server blogs, Andy Leonard wrote a nice article praising iterations. I’m often surprised at how often articles about SQL Server are relevant for Oracle DBAs when the databases are so different. This article however, is relevant for everyone. Not just DBAs, all developers.
I agree with Andy that software tends to improve with iterations. I know how I tend to have trouble when I start developing a new package or script, but once I get something written and start running it, debugging and improving as I go, the process becomes quicker and easier as the code gets better and better. Often, at the end, there is very little code left from the first iteration. Everything has been rewritten.
When I look at Oracle, though, I realize that iterations only go so far. There is some level of conceptual integrity, that requires a really strong up front design.
Conceptual integrity is not easy to define. Like beauty, quality and porn, you know it is there when you see it. When the different pieces of software work together in a way that really makes sense. When there are few basic concepts to the software, and when you get those you are rarely surprised by the software behavior, everything falls into place in a very understandable way.
Consider Oracle. No one could call Oracle simple, or easy to use. But, it is really not too complicated either. There is the SGA, cut up into buffer cache and library cache. There is the redo log and its archives. The control file, data files, undo space. Tablespaces and extents. Five basic background processes. You learn all that in a two day architecture course, and after that everything makes sense. Backup and recovery, IO issues, CPU use, locking and concurrency, ORA-01555, latches, everything you do as a DBA is really based on very few concepts. Oracle is really beautiful this way.
And I don’t think this simplicity can be achieved by iterations. It is perfected by iterations, many of them, and it is debugged in many many more iterations, but the ideas have to be in place first. The foundation that the building is based on. Without a strong foundation, the building will end up unstable, no matter how much you will polish and color the walls.
Ibrahimdemir suggested I should describe how we migrated from SQL Server to Oracle. Here are few notes on the subject. Obviously, there are lots of details here – the move took anything from 2 years to 6 month, depends on where you start counting, so I’m only mentioning the highlights.
1) Motivation: The official reason for moving was that Oracle released version 10.2. It became clear that many companies are successfully deploying RAC systems and that it represents a real solution for database high availability. High availability was such an issue with SQL Server that management agreed to the project with surprising speed.
Less officially, we were five Oracle DBAs and we found ourselves managing a production environment that is based almost entirely on SQL Server. We hated that and wanted a way out. RAC was simply a bait for upper management.
2) The applications: Our largest application was supposed to support Oracle out of the box, but we needed to change JDBC drivers to one that supports RAC. During tests we found other errors related to Oracle being different than SQL Server (Lots of issues around case sensitivity, which completely slipped my mind when I wrote the previous post), but mostly the change on the application was rather painless.
Other applications needed to be changed. Anything from small modifications to the connection string, to complete and total rewrite – depending on how the application used different database features.
We had to convert around a dozen applications, and the most difficult part was hunting them all down. Just yesterday, five days before we shut down the SQL Servers, we found two more applications that were somehow forgotten.
3) Deciding to migrate: At first we didn’t plan to migrate at all. We planned to put new customers on Oracle and keep existing customers on SQL Server, with the understanding that at some point the SQL Server environment will die naturally.
We lived with both environments for about a year and around that time two interesting things happened – we discovered that a two node RAC can carry the same load as six SQL Server machines and give significantly better performance, and the data center started pressuring us to use less electricity. The decision was made – move everything to Oracle and shut down the SQL Servers.
4) How to migrate: By the time we decided to migrate, we already knew exactly how the applications should work on Oracle. The correct schema were already in place. The only question was how to move the data. Our development team took a very controversial approach (I’m still not convinced it was the right one), and they wrote two utilities for the move.
The first utility could dump an SQL Server schema to a bunch of CSV files, and then load the CSV files into a ready schema on Oracle side. The utility moved only data, not definitions (table structure, indexes, constraints, etc were not included), but it was OK because we had the right definitions on Oracle already.
The second utility could “listen” to new data coming in to the SQL Server, and copy that data to Oracle. We used sequence (and identity) numbers to get this done.
Using both utilities we could copy a customer over to Oracle, run both systems in parallel for few days until the customer was happy with Oracle and then cut off the data duplication and remove the SQL Server schema.
We moved over 200 customers this way. One by one over the course of 6 month.
5) Summary: We thought we got it all nailed and done. We were ready for the big shut down on Wednesday. However, this morning we discovered two applications that were forgotten. So, shutdown was delayed by two weeks (which are rather costly, because the data center really wants us to cut down on electricity), during this time we have to make the necessary changes in the application, test them and make the move. We hope that this time it will really be it.
We had to apply yet another opatch, but we are only allowed 4 hours of downtime per system per month and we already used our monthly budget, so we need a way to apply an opatch without any downtime.
On some of our systems, it is not an issue. They are RAC systems where each node has its own $ORACLE_HOME on its own server. We take one node down, apply the patch, start the node, stop the other node, apply the patch, start other node. Patch installed on both nodes, no downtime for our customers. Win-Win.
But what do we do about our other systems? The ones which share a single $ORACLE_HOME on a filer? Where we need to take both nodes down for applying the patch?
A co-worker came up with a brilliant idea:
Stop one node. Use the filer power to duplicate $ORACLE_HOME. Connect node to new home, just make the change in /etc/fstab, the database will never notice the difference.
Apply patch in new home. Start database in new home. Now stop the second node and connect it to the new home as well. Start the node in the new home. We have a patched DB with no downtime in a shared home system! We even have a built in rollback – connect one node after the other back to the old home, where we didn’t apply the patch. In my experience rollback of opatches don’t always work, so having a sure rollback plan is a great bonus.
We tested it today in a staging environment and it seems to work well. Now we just need to convince management that we should do it in production. It looks like a great solution, but in my experience management hates approving any plan that does not appear in Oracle manuals. For all their talk of innovation and “thinking outside the box” they are a very conservative bunch. I can understand the extreme risk aversion of IT management, but if you never do anything new, you can never improve, and thats also risky.