Rethinking RACPosted: January 17, 2008
I’ve working with RAC since my first day as a DBA. My first task was to install a RAC server (took well over a week), and since then I’ve installed dozens of RAC servers, more than anyone I know, and I spend 90% of my time maintaining them.
I’ve had lots and lots of trouble with RAC, but at the end of the day – I love RAC, with all of its marvelous complexity. I love the ability to do rolling maintenances, without any downtime to our customers and quite easily and I love the technology and the amazing ideas behind it.
Which is why everyone was very surprised that in a large DBA and management meeting, I suggested replacing RAC with DataGuard for HA. The reason for the meeting was cost-cutting. The objective was clear – reduce the cost per customer. We do shared hosting (mostly), so to reduce the cost per customer we need to either reduce the cost of a single system or put more customers on one system. Both options are viable, and I’ll also write a post about how to max-out an existing system, but it seemed to me that replacing RAC with other HA alternatives will be a very immediate way of cutting costs without significantly lowering availability.
Moans Noorgard wrote a while back an amazing article: “You probably don’t need RAC“. I’ve spent the days since the last meeting reading it again and again, and trying to prepare a rock solid case that our system can host more customers with more availability and lower costs without RAC. I’ve also found a good post by the storage guy on the same subject, he is feeling the pain with 11g RAC. The real irony is that at the same time I’m claiming that we don’t need RAC at all, I’m still proceeding with our 11g clusterware tests, because a DBA should always be prepared.
A year later Moens wrote another good article, this time about how difficult it is to get his message accepted, he also mentions that the discussion is very emotional and non-technical. I hope my experience won’t be as bad as his, but I think it may be worse. In addition of the usual difficulties of getting people to participate in a serious technical discussion (it is a lot of work to prepare a serious technical case, and much easier to resort to rhetorics), the entire team that made the RAC decision three years ago is still around, and saying “we made a wrong decision and stuck with it for three years” is very difficult at best of times, and then Oracle sales will get involved sooner or later, the difference between our RAC and non-RAC cost is very high (especially since we are talking about many servers), and I can’t see Oracle accepting the sudden loss of revenue without a fight.
Maybe Oracle is fighting back already? A while back Kevin Closson wrote number of articles about RAC, its performance, high availability, maintenance, etc. I remember they were very good, but I’ve made the mistake and didn’t save a copy assuming they will always be there. Unfortunately, many good articles are no longer available.
I’m doing Log Buffer again this Friday! Don’t forget to visit for the hottest DB blog posts of the week.