I’m going on a two week vacation tomorrow. I’ll be traveling, partying, visiting friends and family and shopping. I will not be anywhere near a production system, and I’ll do my best not to think about databases at all during the vacation. So I don’t expect to write any new posts here until November 7.
More about Change Management October 19, 2007
I’ve written in the past about change management processes, and if you’ve read my old post you know that I’m a huge fan of change management. However, I’ve noticed that while everyone loves change management in theory, most of us hate it in practice.
I suspect that the software we are using for the change management makes it such a pain to manage the changes that people are willing to take huge risks in order to avoid using it. We are currently using a home-brew system for our change management, but we will soon start using a commercial system that should be customized for our use. I’m trying to collect requirements on what the change management software should do. My main goal is to have a system that will not be a pain and will not hinder the change management process, anything better than that is a bonus.
- The system should have good response times. If a user already knows everything he wants to enter into the change request, filling the request should not take more than a minute or two. Users will start feeling physical pain when using the system if the change request form contains pop ups that take 30 seconds to display.
- It should be extremely easy to find changes you are looking for. For every change that someone creates, there will be 3-5 people that will want to read the change and will have to look for it. Make it easy for them and you just doubled the productivity in your team.
- The system should support whatever processes you have in place. Suppose that a change is considered major if it causes a downtime of over two hours, and major changes should be signed off by a customer representative. You want your system to recognize that a change is major and send email to the customer representative with a URL that allows him to approve. You don’t want your DBA to realize this a day before the change, look for the customer representative, ask her to send email with her approval and paste the approval into the “notes” section of the change.
- You want to be able to get useful reports out of the system. What percentage of the changes succeeded? was there a difference between RAC and non-RAC DBs? how many changes were rejected by customers? what were the reasons for rejections? how long does it take to approve a change? what’s the 90th percentile?
- You want to be able to do planning meetings and customer notifications from that system. What changes are going to be done next week? In two weeks? which systems will be affected? were customers notified?
I’m sure that by the time the new system will arrive the list will grow significantly.
*
Netapp are making decisions by consensus – and live to blog about it.
Steve, on the other hand, thinks that the storage business is dead. I hope Netapp will survive this, because they are such a fun vendor with fun company culture.
*
I registered to Oracle Open World 2007! I hope I’ll get to meet few of you there, it can be loads of fun.
Debugging SQL October 17, 2007
I love SQL. Let me count the ways:
- It really fits the relational database model. You only have to work with Hibernate for few days to appreciate the beautiful simplicity of SQL.
- SQL doesn’t have direct memory access. Which means that I don’t have all those freaky bugs that C developers have, where you write random chunks of memory by mistake and get totally random results that are impossible to debug.
- Even more important – the concurrency model is relatively simple and as a bonus the DB will detect and resolve deadlocks for you.
- It is easy to read. There are no “obfuscated SQL” contests for a reason.
- Eye on Oracle got few more reasons, just in case you need convincing.
However, one of the things I really wish I had is a nice SQL debugger, because I’m tired of picking apart complicated SQL statements by hand just to find the bad join that causes duplicate rows or missing rows or whatever.
I realize that debugging SQL is not as trivial as it would be for C, because an SQL statement does not contain a list of well defined operations that one can step though one by one while watching key variables. But there are a bunch of logical set operations that make up each query. Even if Oracle’s implementation of them can get complicated – an “exists in” is logically a nested loop, join is a set product which can then get filtered, etc.
So, why not build a tool that allows you to walk through this operation. The tool should allow you to see the data set after each operation. This way it should be much easier to find the join that breaks everything. After all, we go through a very similar process every time we try to debug a query. It is clear that we need a tool that will support the task that we do so many times a day.
The environmental DBA October 15, 2007
Today is Blog Action Day and everyone is posting about the environment.
As IT workers, we are privileged to have more impact on the environment than most people. While most people may bike to work or recycle some stuff, IT workers can help reduce the electricity usage in their data centers, with a significant impact on the environment.
This year I’ve been involved in a huge project to reduce the electricity usage in our data center. We cut it by more than 30%. Here’s how we did it:
- We got rid of the mirror filer. Well, almost. Over the last 10 years, the mirror filer saved us about 12 hours of downtime and never ever saved our data. Thats not a good use of money or electricity. We used the mirror filer to set up clustered failover (does that count as recycling?) so we still have some high availability, but now each controller is serving a different group of databases, instead of one being a mirror of the other, saving us the need to buy a bunch of new disks for the new databases. I know how some reputable DBAs hate RAID 5 so much they won’t even talk about it, but it seems to work for us just fine.
- VMWare. Most of our application servers are now running on virtual machines. This was not a simple project. Politically, application owners hated it and started blaming VMware for every application issue (actually, I liked that, because they used to blame the DB). Technically, we needed to figure out which applications can be hosted together and which have to be separated. We also had to build a number of management tools to keep track of which machine is hosted on which VMWare. Overall, its worth it, not only for the electricity, but because VMWare is really fun to work with. We used to wait weeks for a new machines – now it is 20 minutes!
- Optimize the code. CPU and IO hogging applications use more electricity. Optimize your code to use less CPU or less IO, and you help not only your users but the environment as well. Efficient code is good for earth!
When I am not allowed to do a good job October 13, 2007
So, we have this customer who has way too many grants and permissions on our database. This kind of thing should have been illegal, but sales sold him a solution that required this and we were forced to cooperate.
Now this customer requires from us about twice as many resources as all the other customers combined. Maybe more. Theoretically he got all those grants so he can administer his own instance and we shouldn’t be involved with him at all, but practically the customer and his DB developers bring the database to its knees every few days, then they blame us and we have to go in, prove it is not us and in the process fix the issue for them.
Now, we’ve been involved in enough of these fires to know much of their code and processes inside and out, we know where they coded to unrealistic requirements (poll ever second a table that updates every five minutes), and where the requirements were ok but the code could be optimized. In fact, we believe we can clean up their code enough to prevent many of these fires.
The customers will be happy if we help them fix the code, they don’t enjoy the frequent problems any more than we do. We’ll be happy to fix the code two – every decent DBA will prefer to fix code rather than spend few hours a week killing processes and writing emails documenting what he did and what the developers should be looking into.
So why don’t we do that? Because the customer didn’t pay for us to do “DBA work” for him. He paid for the instance and for the liberal access, but not for the DBAs. So, DBAs are not allowed to fix any problem with the customer code or do any other DBA work for him. We are only allowed to shoot processes that cause CPU or IO problems on the server itself, send the customer trace files, and reboot the servers when the customer requests it. Never mind that this decision is actually costing us *more* DBA time, and therefore the company can both save money and make the big customer happy by giving him more than he asked for. Never mind that DBAs are usually much happier when they are actually allowed to do their job. The most important thing is that we are following our contracts to the letter.
*
Jeff the SQL Server blogger explains that distinct is not a function. You have no idea how many times I’ve seen this mistake (and even did it myself once or twice).
Problems connecting ASP pages to Oracle October 12, 2007
It has been one of these days. I barely had time to connect my laptop this morning, when an operator appeared in my cube, announcing that an application started giving the following error: [[Microsoft]][[ODBC Driver Manager]] Driver’s SQLAllocHandle on SQL_HANDLE_ENV failed. It started this morning, yesterday it worked fine.
The application owner admitted that he restarted the IIS server on the machine this morning, but swore that he made absolutely no changes. Of course, they never make any changes. God knows what they do all day. Few minutes later it became clear that although every user in the system can start sqlplus, IIS thinks that Oracle drivers are missing, and therefore it cannot connect to our Oracle databases.
I searched the web for information and I found two useful documents:
- http://support.microsoft.com/kb/255084 - Official MS troubleshooting guide. Excellent document which covers every possible cause for the issue. Highly recommended.
- http://www.orafaq.com/forum/?t=msg&th=38079/0/ – This document contains a strange suggestion that worked for many users:If you navigate Oracle home you will find that the directory has different permissions then the root of the tree. In the case of 10G the path is C:/Oracle/product/10.1.0/Client_1. Open the properties for the Client_X directory, and you will see that “Authenticated Users” should have Read and Execute, List Folder Contents, and Read. To fix the problem, do the following….
- -Select Authenticated Users
- Uncheck Read and Execute
- Press Apply
- Check Read and Execute
- Press Apply
You will need to restart the IIS and maybe even reboot the machine after making these changes, and it should solve your problem.
Very strange, but looks like it works in some cases.
My problem looked like a permission issue, but doing what the MS document or the forum suggested did not help, and I didn’t want to reboot the machine.
So, I worked around the problem in a rather unsafe way – I changed the “Log on as” parameter of the IIS web server from “LocalSystem” to “Administrator”. Works like charm.
*
A fellow geek blogged about an extremely unreliable way he found to communicate information to himself. Not database related, but very amusing to all computer geeks.
DBA hallucinations October 11, 2007
Every once in a while something happens that makes me question not only my skills as a DBA, but my sanity as well. Today’s event was one of the stranger ones that happened to me.
A coworker asked me to export a schema and import it into the same database but under a different name. Very common request, not something that is very likely to go wrong. I’m running the expdp command, finished without warnings. Time to run impdp command, with remap_schema to load the data into a new schema, again it finishes with no warnings. I log on to the database and change the password for the new user, again it finished successfully.
Great, I call the co-worker and I tell him his new schema is ready. Two minutes later he is back, and complains that he gets an error logging in. Puzzled, I try logging in as that user myself:
SQL> conn test_user_copy/secret
ERROR:
ORA-01017: invalid username/password; logon denied
Even more puzzled I check dba_users for the user I just created. It is not there. Blinking, I check again. Still not there. I know that I successfully ran “alter user… identified by….” on that user not two minutes ago. It is a large user – dropping the schema would take significantly longer, even if someone was so inclined, which I doubt.
I recheck the import log, and it still seems that the new user was created and the data was imported successfully. Resigned, I reran the import command. It succeeded, which means that the user really was no longer there. I altered the password again, verified that the user still exists and notified the coworker that he can try again. He confirmed that everything is fine, so I guess the user didn’t disappear this time.
Puzzled, I consulted a senior DBA with my problem. He suggested looking at the alert log, as exports and imports also write a rather cryptic line there. I checked. The alert log clearly showed only one import – the second one.
The senior DBA kindly suggested that I was imagining things when I thought I was running the first import, and maybe I should go home early today and get some rest.
I really have no explanation for this event. I’m not even planning on contacting Oracle Support with this story, it is just too strange.
*
While searching for something vaguely NFS related in Google, I found out that someone in Netapp has a blog. I was unable to find out what is he doing in Netapp besides blogging, but I found his posts interesting. I happen to love working with Netapp – both the appliance and the vendor, and the blog really echo the competent and friendly vibe I get when dealing with the company.
Discoveries October 10, 2007
I’m not into cutting-edge things. I drive an old car, use an old computer, I still have Oracle 9i installations hanging around, and I use gadgets only when my boss forces them on me. I like the tried and true stuff. But sometimes even people like me have to realize that the technology world is advancing and they should join the crowd and use the latest innovations (or rather old innovations in my case).
This week I discovered three new and amazing technologies – LCD TV, Facebook and VMWare. I realize that for most of the world these are as new and exciting as last year’s snow, but I managed to avoid them until now.
Our infrastructure team has been pushing VMWare for the last two years, because like every other infrastructure team they are trying to cut on electricity. Production databases make poor candidates for virtualization though, we use all the CPU and RAM on the machine anyway, so I never worked with VMWare up close.
Well, today I finally received a new desktop machine, and I decided to use it to try out VMWare, RHEL4, Oracle 11g and ASM – all at once. I got as far as installing VMWare and RHEL4, and so far so good. VMWare is by far the most exciting technology I’ve used since I found out about Netapp. I spent a good hour creating different machines with different configurations and then deleting them just to see that I can. I add disks and network cards, played with their configuration and then dropped them. It is truely a joy.
Redhat installation also got much easier than I recalled. On the other hand, the last RH I personally installed was 2.1. I was surprised to discover that the “Editors” packages was not part of the default installation – I don’t think Linux is all that useful without VI. I also opted to install MySQL on the machine, another amazing new technology that I should really get more experience with.
I do remember seeing some folks on the blogsphere bragging about how they install Oracle on Linux in under an hour and MySQL in about 10 minutes including the Linux installation. I have no idea how they do it. I already spent around half a day with VMWare and Redhat and I’m still not done. But now that I have VMWare there is no excuse not to practice.
My other two discoveries are not nearly as exciting. TV is nice, you can see movies in it. Facebook is interesting – there are lots of people there, including few VPs from the company I work in, but I still didn’t figure out what I’m supposed to do there.
Lucas Jelma, at the great AMIS blog, describes how to use virtual columns to enforce business rules and gives terrific examples. The post is simply perfect – both innovative and practical.
Tips for Senile DBAs October 5, 2007
I’m not sure anyone except me actually needs these tips, DBAs tend to have phenomenal memory, but I have to write these tips down before I forget them.
When I made my big career change and became a DBA, one of the things I was worried about is that I’ll never be a good DBA because I have rather weak memory. All the DBAs I knew had amazing memory for all kinds of exotic information and details – not just database related, they remembered URLs, IPs, phone numbers and the name of that guy who runs the development team of the product that is currently misbehaving. I can’t do all that, so I have to find ways to compensate.
- To Do lists. Write down all your tasks. I don’t need to elaborate here, do I?
- Write small notes to yourself. Especially if you are interrupted in the middle of a critical task. My coworkers keep asking me if I saw the movie “Memento”, but I don’t care. My memory is good enough to remember the time I was interrupted in the middle of an update, and when I returned to work I remembered I should run the update, but I didn’t remember that I did not write the “where” clause yet. Now I write down “where clause!” on a small note before I answer any question.
- Write a blog. You know how the DBA down the hall always says things like “Ah, yes. We saw this problem with the filer five years ago whenwe did the big Foo upgrade and ordered pepperoni pizza . Just do umount -lf on the partition and mount it again”? Well, they may have a great memory, but if you have a blog you can always write interesting problems and solutions in the blog, and look for them when something looks suspiciously familiar. Actually, the very act of writing the story will increase the probability that you will remember it. Besides, blogging is great fun and you get to meet interesting people.
- Regardless of the blog, keep a knowledge base. It doesn’t need to be fancy. Mine is a bunch of text files in a directory named KB. It contains all the important queries and commands that I use a lot but never remember. The query for getting interesting information for a session based on the process ID, my favorite export parameters, various queries on active session history, favorite ps options. They are all there, because I grew tired of forgetting them.
- Give everything meaningful names. I like reading code that was written by senile programmers, because it is always very readable. They know that unless everything is meticulously documented and very easy to understand – tomorrow morning they will not know what the function should be doing. When you can’t rely on your memory, you rely on your code – which is much better for everyone.
- Refer to the documentation. I rarely remember all the parameters that a functions need, all the options for a command or exactly what is the syntax for starting the archive log on a 9i database. Its a wonder I passed the OCA. This means that I’m constantly working with Oracle’s online books and references. It is obviously slower than remembering everything, and I often envy the coworker who can remember all the options for gather_schema_statistics, but when I read the documentations I often find better ways of doing the job than the one I should have memorized, and sometimes I run into other interesting bits of information.
- Remember that when all your coworkers will celebrate their 50th birthday and will complain about losing their memory, we will just keep doing things the way we are used to without noticing any difference.
I’m sure you all read Tom Kyte’s blog, but just in case you didn’t read todays post “My name is Tom and I’m a terrible programmer”, go and read it now. Not that I believe for a second that he is really a terrible programmer.
Best Practices can be Dangerous October 2, 2007
Secret Geek posted about a specific behavior very common among pretty much all technical folks: The tendency to turn good advice into gospel.
He gives the example of having one exit point per function, advice that was important in the past and lost its importance in some of the newer languages.
There are tons of such “best practices” for developers: Goto are considered evil, multiple inheritance is scary and global variables will cause the western hemisphere to sink into the ocean. All of the above represent very good practices. However, sometimes, when you decide to ignore this advice in favor of other considerations such as code readability, you risk running into opposition that cannot rationally argue the pros and cons of each approach and just keeps quoting the best practice “… but you can’t use a GOTO! It is EVIL!”.
DBAs have their own best practices, of course. Buffer Cache Hit Ratio used to be everyone’s favorite, but now it was so widely discredited that it cannot be used as an example anymore. At least I hope so.
Sometimes working in an organization, you run into some very strange configurations that can only be a result of a best practice, the justification of the practice was long forgotten, but the implementation lives on because no one understands the matter enough (or cares enough) to try and change the configuration.
One such configuration I’ve seen is mounting a single NFS volume to multiple mount points and configuring the DB to use them as though they were separate volumes (i.e. put different data files on different mount points). I’m not sure what effect it was supposed to achieve – maybe better stripping? I do know that it was considered a best practice at some system at some point in time, and no one knows if it is still relevant.
If I were Kevin Closson, I would have figured out the matter in about half a day of work and few well aimed benchmarks. I’m not as good with benchmarks, and I’ve never taken the time to get to the root of the matter, so I have to wonder how everyone else can be so sure of performance benefits from this configuration when they never tested the alternative, and can’t even theoretically explain what can cause such benefits.
And something completely different: Alex Gorbachev of Pythian is having lots of fun at Miracle Open World conference, and he writes about it in the Pythian blog. Boy, does it sound like a great conference.

Subscribe 