I'm just a simple DBA on a complex production system

Writing about all things production. Especially Oracle databases.

Meeting Mr. Pythian November 30, 2007

Filed under: musing, mysql — prodlife @ 4:17 am

Last night I had dinner at San Francisco with Paul Vallee, CEO of Pythian.

Just in case you never heard about Pythian: They are a company specializing in remote DBA services. There are other companies that do remote DBA, but Pythian is special, because other companies don’t contribute to the community nearly as much as Pythian does.
They have a Pythian blog, with useful technical articles. Incidentally that was the second blog I started reading (The first was Tom Kyte’s). Their second contribution to the community is even more unique -  Log Buffer – the weekly compilation of everything that is hot in the DBA blogsphere.

With so much community interaction going on, it was not surprising to find out that Paul Vallee is on top of every industry trend and has some unique ideas on where the industry is heading. Talking to him for few hours gave me more stuff to consider than I usually encounter in a week or maybe two.

Paul is a huge believer in the future of MySQL. I know a lot of Oracle DBAs think that MySQL is barely a database, but according to Paul there is a huge demand for DBAs that can manage MySQL, because many organizations find out that in some cases Oracle is too much of a database. I have some experience with MySQL and I can’t say I like it much. On the other hand, I like the MySQL community a lot – it is small, friendly and very helpful to newbies. I’m not sure this nice community realize how much more mature Oracle is. Not just in terms of features, also in things like scientific thinking and optimization methodologies.  MySQL community needs people like Tom Kyte, Cary Millisap and Jonathan Lewis, to give structure to the community knowledge sharing and discussions and add deeper and more methodical content.

As I wrote in the beginning, Paul is working hard to anticipate the trends of the database industry, and he thinks that DBAs that define themselves by vendor (Oracle DBAs vs. SQL Server DBAs) are a thing of the past. The same way that developers tend to work in multiple languages through their careers, DBA careers are heading the same way. I’m not sure I agree with this view, I have a feeling that there is a fundamental reason that makes it sensible for DBAs to stick mostly with one vendor, but it is certainly something interesting to consider.

It was not surprising to hear that Paul believes in hiring the best DBAs he can find, the surprising thing was that he seems to think that a DBA team is about as strong as the best person in the team. A team with one very experienced and competent DBA will be significantly better than the same team without him. It makes sense to me, but it is well known among developers that a team is as good as its *worst* developer, so you really want your team to be as strong as you can get while pretty much everyone is on the same level. Another thing to think about.

As you can tell, it was a pretty interesting dinner :-) I’m still a bit stoked that I got to meet a real live CEO, and one that founded a company that I really admire. This is just too cool.

**
I didn’t post suggested links in a while.  Lutz Hartmann wrote about an interesting  initiative from Oracle and Netapp. I was excited about it for a full hour, before our storage manager told me that since we are moving away from Netapp in the near future (It is not Netapp’s fault! We love Netapp! It is a political decision), we won’t be testing the new initiative.

 

Preinstall Checks and a Book Review November 28, 2007

Filed under: books, tips — prodlife @ 2:52 am

This post is two topics for the price of one :-)

1. A smart co-worker discovered that you can use RDA to check the pre-requisites before installing Oracle. What used to be an annoying, time consuming and error prone task, because almost fun.

Before installing the database, get RDA and type: ./rda.sh -Tdv hcve

This will generate a nice HTML page, with a list of tests that ran, which ones failed, why they failed and what you should do about it. Now you can send relevant parts of the report to your system administrator and ask for fixes.

2.  Over the Thanksgiving holiday, I’ve read Joel Spolsky’s “The Best Software Writing”.  It is a difficult book to review, because it is generally a collection of essays, chosen to represent different aspects of the software world, and as such they vary in appeal and quality. Topics include programming languages, programming career, the effort required to ship a product, marketing and management. Naturally, everyone will be interested in a different subset of the essays.

I have some problem with the premise behind the book – Joel seems to believe that there is very little good writing about software out there, because good programmers are bad communicators, and therefore the very best writing should be show cased. I believe that a lot of good programmers, not to mention good managers, are also very good writers, and that the quality of essays in the book was not significantly better than what I get almost daily in my RSS reader.

I was somewhat disappointed about this book because I bought it thinking it will be similar to “Oracle Insights”, a bunch of war stories from the field that are entertaining and educational (and in some cases a bit painful). Unfortunately, the essays that Joel chose were not as good as the stories that OakTable has provided. Despites Joel’s assertion in the beginning of the book on the importance of telling stories, the book lacked vivid real life stories, and in many cases it also lacked meaningful lessons and conclusions.

In short, if you are managing a development team, Joel’s book may interest you. If you are working with databases, Oracle Insights will be a better investment.

 

Dissociated Press November 26, 2007

Filed under: plsql — prodlife @ 7:50 am

There is this nice Emacs game where you take some random text and use random walk to generate a somewhat similar but completely random text out of it. It is called “Dissociated Press“.

Out of sheer randomness, I’ve implemented this in Oracle.

Notes:

  • I’m using varchar to hold the text, which limit us to short and therefore uninteresting texts. CLOB would be better.
  • If there is any serious amount of text involved, you’ll want to create a non-unique index on the first two columns of the markov table after you populate the table, and gather statistics on that.
  • Feel free to criticize my code. That is why I post it. But note that I’ve broken some lines so it will look nice in the blog.
  • Also feel free to rewrite the entire thing with a single model statement.
  • The algorithm should be pretty straightforward, but you can read about a similar implementation here.
CREATE TABLE markov
       (c1 VARCHAR(30),
       c2 VARCHAR(30),
       c3 VARCHAR(30));

DECLARE
        cnt BINARY_INTEGER;
        i BINARY_INTEGER;
        my_str VARCHAR(4000) := 'A bug that has been documented.
To call something a feature sometimes means the author of the
program did not consider the particular case, and that the program
 responded in a way that was unexpected but not strictly incorrect.
 A standard joke is that a bug can be turned into a feature simply
 by documenting it (then theoretically no one can complain about
it because its in the manual), or even by simply declaring it to
be good. Thats not a bug, thats a feature is a common
catchphrase.';
        my_comma_str VARCHAR(4000);
        my_table dbms_utility.uncl_array;
        num_itr INTEGER := 50;
        tmp_c1 VARCHAR(30);
        tmp_c2 VARCHAR(30);
        tmp_c3 VARCHAR(30);
BEGIN
     select REPLACE('"' ||
	REPLACE(my_str,' ','","') || '"',',"",',',')
	INTO my_comma_str FROM dual;
     dbms_utility.comma_to_table(my_comma_str, cnt, my_table);
     FOR i in 3..cnt LOOP
         INSERT INTO markov (c1,c2,c3) VALUES
	(replace(my_table(i-2),'"',''),
	replace(my_table(i-1),'"',''),replace(my_table(i),'"',''));
     END LOOP;
     COMMIT;
     SELECT c1,c2,c3 INTO tmp_c1,tmp_c2,tmp_c3 FROM markov
     WHERE ROWNUM=1;
     DBMS_OUTPUT.put_line(tmp_c1 || ' ' || tmp_c2 || ' ');
     FOR i IN 1..num_itr loop

     	 SELECT c1,c2,c3  INTO tmp_c1,tmp_c2,tmp_c3 from
		  (SELECT * FROM markov WHERE c1=tmp_c2 AND
		  c2=tmp_c3 ORDER BY dbms_random.VALUE) t
		  WHERE ROWNUM=1;
		  DBMS_OUTPUT.put_line(tmp_c2 || ' ');
	 END LOOP;

END;

**
This post has been partially inspired by this one.

 

Common mistakes in RAC installation November 24, 2007

Filed under: rac — prodlife @ 9:22 pm

This was supposed to be my OpenWorld Unconference session, which I didn’t give partially due to shyness and partially because I preferred to spend my time listening and learning.

I’m probably the worlds expert on failed RAC installations. I started my career as a DBA by spending four days with a consultant failing to install RAC in our test environment. In the three years that passed since that fatefull week, I’ve probably failed installing RAC over fifty times (I’ve succeeded quite a few times too), so I’m well qualified to tell everyone how to fail installing RAC.

So, how do you completely screw up your RAC installation?

  1. Don’t use the installation guide. Thats a common mistake done by both beginners and experts. If you don’t follow your RAC installation guide closely, your RAC installation will fail. The installation is simply too complicated to do from memory or by hunches. That is the most important thing to remember. The rest of this post will just contain common consequences of not following the installation guide. Also keep in mind to match the version of the installation guide to the version of RAC you are actually installing, because some things change with time.
  2. Your nodes don’t see each other. Huge mistake. Your nodes should be able to connect to each other by name, ip and fully qualified domain name, through public ip and interconnect ip. Verify with pings. Also make sure your host name is spelled the same everywhere – some parts of the installation are case sensitive.
  3. Don’t verify that all your RPMs are installed before beginning the installation. Unfortunately, this is a very easy mistake to make, because the RPM list in the installation guide is somewhat incomplete. There are metalink articles that attempt to correct the mistakes, so look for them. Keep in mind that at least in 10g, the prerequisite check didn’t cover all the required RPMs, so if you mess this step you will end up with a rather random error during the installation.
  4. Ask your network manager to configure the VIP in Linux before your install your cluster ware. Don’t. Just ask him for an IP – Oracle has a VIPCA utility that will configure and manage the VIP for you. If Linux already controls the VIP, RAC installation will fail.
  5. Configure SSH incorrectly. SSH configuration is a somewhat tricky part. Remember that your nodes should be able to ssh each other with user oracle without ssh asking for password or saying anything. ssh remotenode date should just give the date.
  6. Different times for different nodes. All nodes should show the exact date and time.
  7. Bad permissions on shared storage. Verify that root on all nodes has write access to the voting disk.

Thats what I recall right now. I’m sure there are lots more.

 

Small shortcut in a large report November 22, 2007

Filed under: design, plsql — prodlife @ 1:40 am

Came back from Oracle OpenWorld to a flurry of business demands. End of Quarter is here and with it new demands for reports and data.

The main report that I’m writing is rather large, generating the data requires about 30 different queries on about 10 databases. Some run in few seconds while others, despite my best tuning efforts can take an hour or more.

So, I wrote a PL/SQL procedure that runs all the queries on all the DBs and collects the results into a single table that I’ll use in the report. That’s pretty cool, but suppose there is an error in query number 23? After I fix the error, I have to run the report again which takes hours. Quite annoying.

So, I created a small table:

CREATE TABLE PROGRESS
( "STAT_NAME" VARCHAR2(30),
"SCHEMA_NAME" VARCHAR2(100),
"TIME_DONE" DATE)

And now, before each query runs, I check in the progress table if it was already done or not. If it was already done, I don’t need to do it again. If it wasn’t, I run it and after it ran successfully, I add it to the table.

Here’s the procedure that takes care of this process:

create or replace
PROCEDURE CONDITIONAL_EXEC ( in_stat_name IN VARCHAR2,
in_schema_name IN VARCHAR2,sql_stmt in varchar2) AS

stat_status int;
BEGIN
select count(*) into stat_status from progress where schema_name=in_schema_name and stat_name=in_stat_name;
if stat_status=0 then
dbms_output.put_line ('About to run: ' || sql_stmt);
execute immediate sql_stmt;
insert into progress values (in_stat_name,in_schema_name,sysdate);
commit;
end if;
END CONDITIONAL_EXEC;

Advantages of this approach:

  1. I save time by not running the same stuff twice
  2. I can always “undo” something that already ran by deleting from that table
  3. I can watch the progress of my report in the table
  4. The table contains the finish time of each query. This means I have rough information on how long each query took to run and I can use this information to decide which queries need more work.
  5. I can also use the time information to show the business how much db time each statistic they request costs them.Maybe they’ll decide that some statistics are not that important? Especially those that are carefully crafted so we can’t use any existing aggregated data to produce them. Unlikely, but who knows?

Not bad for 5 minutes of work.

 

Thoughts about Oracle OpenWorld November 18, 2007

Filed under: openworld07 — prodlife @ 10:17 pm

I hate to admit it, but the only reason I went to Oracle OpenWorld was that none of the senior DBAs in the team wanted to go. The Boss agreed to send one DBA to the event, and if any of the more senior staff were interested, I’d have stayed at the office.

Fortunately, the older DBAs hated Oracle OpenWorld. They found it too crowded, too intense, too marketing oriented and not technical enough. And they are right, it is all this.

But to me, Oracle OpenWorld was pure magic.

I spend most of my days in a mundane DBA world – we export and import, we backup and recover, sometimes we get to tune some bad SQLs. After a year of this, and you get convinced that being a DBA is slightly boring.

And then, for one magical week, Oracle becomes exciting again. You meet people who are excited about what they do with Oracle – whether it is RAC, indexes, Model clause or ASM. What’s even better – in the Oracle Magic Kingdom, everything works. All the new features have no bugs, all patches install easily according to metalink instructions, Oracle support listens to your problems and has the right answers, you have sufficient maintenance windows to perform all your tasks, and performance issues are interesting and yet can be resolved within 45 minutes. Magic.

My favorite part was meeting people who are as excited about the database as I am. The unconference sessions and the blogger meetup was especially good for this. Most of the DBAs I meet on a regular basis view databases as their job, they think I’m rather strange and geeky for getting excited about new features and talking non-stop about storage options. Not to mention that my strange habit of playing with the database on weekends is considered as a symptom of a sever mental illness. In Oracle OpenWorld I got to meet a bunch of people who are every bit as geeky as I am, many are even more so. Saying “Wow! I’ll play with this feature on the weekend” did not cause a single person to take pity on my lack of social life.

I hope my excitement about OOW didn’t cause some of my senior co-workers to decide to go next year, because I want to go again.

 

Last day at Oracle OpenWorld November 18, 2007

Filed under: openworld07 — prodlife @ 9:43 pm

10:00 – RAC Internals – Fascinating session. The speakers covered some of the mysterious clusterware processes, explained how RAC manages group membership, how it protects us from the horrible split brain (I wish they had more details about IO fencing), what it OCR and how to manage it, and finally VIP explained! Did you know that when a node fails and the VIP moves to a different machine, the entire purpose of the exercise is to drop connections faster? That the VIP on the new node can’t accept connections? That was entirely new to me.

11:30 – Direct NFS - There were 3 speakers sharing the session, but Kevin Closson is by far the best of the three. He is simply an amazing speaker, probably the best I’ve heard in OOW. So, why use DNFS instead of traditional NFS?  It is easier to manage and configure, you can run Oracle with NFS on Windows (no idea why anyone would do that, but Kevin thought it was rather cool), and if you have multiple NICs for your NFS, DNFS will scale far better than the old “teaming”.

I’ve had some spare time before my next session, and I spent it by buying few books in the Oracle Bookstore. Two RAC books (because I’ve obviously didn’t satisfy my need for RAC information in this conference), one PL/SQL book and one “Best Software Writing” book for some light reading.

2:00 – Laurent Schneider’s unconference session about SQL Model -  SQL Model statement is a bit complicated, I’ve looked into it several times in the past, only to promptly disregard it as a nice idea but something too complicated to use in real applications.
With Laurent explaining it, SQL Model suddenly becomes accessible and practical. As something I may use when generating my next report and that will make things easier rather than more difficult. The session was really that good. He even posted the examples he used in his blog, so everyone can review it.

 

Oracle OpenWorld – Day 3 November 18, 2007

Filed under: openworld07 — prodlife @ 9:22 pm

9:45 – Current Trends in Database Performance – A great start for what will be a great day. Andrew Holdsworth gave a fascinating session with many practical take aways. I wish I could summarize his talk here, but it would take me an hour and several pages, so maybe I’ll discuss it a bit in a later post.

11:15 - Best Practices for DBAs - I generally dislike the idea of best practices, and would have skipped the session except that I liked Arup Nanda’s articles in OTN and wanted to attend his session. It was a mistake. While Mr. Nanda is an energetic and engaging speaker, I still have a problem when some random practices of differing qualities are being promoted as being the best.

3:00 – RAC Performance Management - It would have been a terrific 3 hour session, but as a one hour session it simply went way too fast for me to understand. I got the part about diagnosing interconnect problems, but when they got to GC contention, congestion and latency – I simply could not follow. I wish Oracle would offer a one day seminar on the topic. I would love to attend that.

4:00 – Lutz Hartman’s unconference session about ASM – Made me wonder if migrating to ASM is really that simple, or perhaps it only seems so when an expert like Lutz is doing this.

 

Oracle OpenWorld – Day 2 November 17, 2007

Filed under: openworld07 — prodlife @ 1:12 am

Of course, with all the excitement of Oracle OpenWorld, I’ve had no time to edit and post my notes during the conference itself, so you are getting everything in a big batch now.

12:15 – RAC SIG expert panel – It was supposed to be a great opportunity to ask RAC questions and have the very people who wrote the product answer. In reality it was a big disappointment. I didn’t get to ask my cool questions (mostly about load balancing and fail over mechanisms) while other people asked questions that were completely unrelated to RAC, like how to move datafiles. I could see that even the experts were rather disappointed.
I compensated for the disappointment by strolling down to the demo booths later and asking my questions there. I also bought two RAC books, just to be extra sure.

2:00 – 11g & ComplianceDan Morgan and Victoria Whitlock scheduled an unconference session about compliance with SOX and the likes in 11g. Due to some quirk of luck, I turned out to be the only attendant of this wonderful session. Everyone else will have to travel all the way to Seattle to learn how 11g makes SOX compliance that much easier. Dan Morgan told me about the wonderful Audit Vault – which was just what we needed for a big auditing project and we didn’t hear of it before, and also how to use 11g’s PL/SQL warnings to automatically detect many issues with code before I upload it to production. This was by far the most useful session I’ve attended this day.

15:15 – DataPump in 11g – Datapump is one of the best features in 10g, and it got even better in 11g. My favorite improvement is the inline compression of the data dump and the ability to import without uncompressing first. Other nice features are inline encryption, the ability to remap data during the import while keeping certain properties of the data format (for example, generate bogus email addresses for the test system), and the ability to convert LOBs to SecureFiles.

16:45 – Top 11 new features in 11g – This talk was not as excited as I expected. Probably because I’ve already spent few hours hearing about new features in 11g, and also because such a rich set of features deserved more than one hour. 5 minutes per feature are not enough. A nice discovery was that Oracle added pivot operator to SQL, I was not aware of this before the lecture. No more ugly case statements when you need to turn your tables sideways. All the other features were the usual suspects – cache, partitions, real application testing, data pump, compression, yadda, yadda.

The highlight of the day was actually the evening. I went to the blogger meetup, and in addition to drinking excellent beer on Oracle’s expense, I also got to meet the wonderful people that I’ve been writing my morning news for the last year. I have to update my blog roll now because I’ve met few cool people who write great blogs that I haven’t been following up on. It was a great experience to know that Oracle Bloggers are not only smart, creative and great writers, they are also very nice people and its really fun to hang out with them.

One of the things that are kind of sad to me now that OpenWorld is over is that I’ve met all these great people, but they all live in the UK, Australia, Switzerland, Oregon, Canada and what not. There is no way I’m going to meet any of them again for a beer and a chat for the next year, at least. Thats a really depressing thought. As much as I enjoy the virtual connection, talking to people over a glass of beer is much much better.

 

OOW2K7 – Day 1 November 13, 2007

Filed under: openworld07 — prodlife @ 5:12 am

11:00 – GRID Design  – I arrived late to this session due to traffic delays, but I still managed to get most of it. It was interesting, but not exciting. The speaker reminded us to pay close attention to throughput and not just to size, and to design for flexibility. He stormed through everything from buses to application design, but never got any details about any of them, just some general good advice.

1:15 – DBA 2.0 in the No Slide Zone – That was the big disappointment of the day.  the description in the content catalog seemed very interesting: “The next generation DBA will play a more strategic role in managing increasing numbers of complex and business-critical database systems. Learn how to take automation, management and decision-making to the next level”, and I assumed that Tom Kyte will talk about how DBAs can  play a more strategic role in the organization.  This topic seemed interesting enough that I decided to skip Steven Feuerstein’s SQL Addiction talk and attend the DBA 2.0 session.
Unfortunately, the topic was in fact Enterprise Manager, in which I have very little interest, and Tom Kyte didn’t speak, he just played host. I felt significant physical pain as the audience laughed at the old DBA who was editing his scripts in Emacs and then applauded the new DBA who point and clicked his way through Enterprise Manager.
Shame on you Oracle, for the false advertisement.

2:00 – Richard Foote talks about Indexes at the Unconference. That was simply the best technical lecture I’ve heard all day, and maybe ever. Richard Foote is a very engaging speaker. He went over some facts and myths and really made everything much clearer to me. I’ll try to hear him tomorrow as well. It is so much fun to listen to someone who is actually passionate about what he is doing and doesn’t sound like an overtrained monkey.

3:15 – Storage Capacity Planning on the back of the envelope. This session was useful. The speaker briefly explained how to measure storage usage in an existing system and how to use this to estimate growth, and then went on to explain how to interpert the capacity capabilities of the hardware components – CPUs, drivers, devices, FC cards and routers, arrays and disks. Taking all this into account requires a rather large envelope, but it was a good session anyway.

4:30 – 11g Performance features. This was the most exciting session, simply because Oracle added many neat features in 11g. I never imagined that table compression could actually improve performance, I didn’t pay enough attention to secure files and they sound very exciting too, combine this with new cache capabilities, partitioning features and new advisors and all of the sudden I’m actually excited about 11g instead of the usual fear that production DBAs feel toward new versions.

This was a great day. My mind is about to blow from all the new information and this is only the first day of the conference. I hope that tomorrow I’ll be a bit less shell shocked.