Rethinking RAC

I’ve working with RAC since my first day as a DBA. My first task was to install a RAC server (took well over a week), and since then I’ve installed dozens of RAC servers, more than anyone I know, and I spend 90% of my time maintaining them.

I’ve had lots and lots of trouble with RAC, but at the end of the day – I love RAC, with all of its marvelous complexity.  I love the ability to do rolling maintenances, without any downtime to our customers and quite easily and I love the technology and the amazing ideas behind it.

Which is why everyone was very surprised that in a large DBA and management meeting, I suggested replacing RAC with DataGuard for HA. The reason for the meeting was cost-cutting. The objective was clear – reduce the cost per customer. We do shared hosting (mostly), so to reduce the cost per customer we need to either reduce the cost of a single system or put more customers on one system. Both options are viable, and I’ll also write a post about how to max-out an existing system, but it seemed to me that replacing RAC with other HA alternatives will be a very immediate way of cutting costs without significantly lowering availability.

Moans Noorgard wrote a while back an amazing article: “You probably don’t need RAC“.  I’ve spent the days since the last meeting reading it again and again, and trying to prepare a rock solid case that our system can host more customers with more availability and lower costs without RAC. I’ve also found a good post by the storage guy on the same subject,  he is feeling the pain with 11g RAC. The real irony is that at the same time I’m claiming that we don’t need RAC at all, I’m still proceeding with our 11g clusterware tests, because a DBA should always be prepared.

A year later Moens wrote another good article, this time about how difficult it is to get his message accepted, he also mentions that the discussion is very emotional and non-technical. I hope my experience won’t be as bad as his, but I think it may be worse. In addition of the usual difficulties of getting people to participate in a serious technical discussion (it is a lot of work to prepare a serious technical case, and much easier to resort to rhetorics), the entire team that made the RAC decision three years ago is still around, and saying “we made a wrong decision and stuck with it for three years” is very difficult at best of times, and then Oracle sales will get involved sooner or later, the difference between our RAC and non-RAC cost is very high (especially since we are talking about many servers), and I can’t see Oracle accepting the sudden loss of revenue without a fight.

Maybe Oracle is fighting back already? A while back Kevin Closson wrote number of articles about RAC, its performance, high availability, maintenance, etc. I remember they were very good, but I’ve made the mistake and didn’t save a copy assuming they will always be there.  Unfortunately, many good articles are no longer available.

***
I’m doing Log Buffer again this  Friday! Don’t forget to visit for the hottest DB blog posts of the week.


15 Comments on “Rethinking RAC”

  1. Hi, if you lost pages but still know the link, you could try the wayback machine: http://www.archive.org/web/web.php

  2. jarneil says:

    Hello,

    I’ve ran RAC since 9.2, was the current release, and in all that time the rolling maintenance seems somewhat overstated. You can’t patch with a node running, certainly not a patchset, and definately not an upgrade.

    Sure an OS patch and reboot is possible, but have you really benefited *that* much from rolling upgrades?

    Dataguard is HA. It’s a good cost reduction idea!

  3. Dan Norris says:

    I’ve heard a lot of similar campaigns going on inside of companies large and small. It echoes a lot of what I have talked about as well–use the best tool for the job. In this context, “best” is measured by the one that best meets your requirements. If one of your requirements is lower cost and less complexity, then RAC may not be the right choice given its price alone, to say nothing of the complexity.

    Good luck in your discussions there. I’ll stay tuned to hear the outcome.

  4. Steeve says:

    Very good article.
    I was wondering if the difference in cost is a lot between RAC environment VS Dataguard environment.

    Do you save also in hardware cost as well as license?
    Ex: you have 4 CPU license on primary site and 2 CPU license on the Dataguard server?

    Is there any feature where going Dataguard vs RAC that you would miss?

  5. A question: is DataGuard really cheaper? I’ve never configured and used Dataguard but i think is not so simple. Using a standby server that is not powerfull as primary is good to save data but does not gives you HA: if your primary site has to serve 1000 users also secondary site has to be capable to serve 1000 users otherwise you don’t have HA. Do you really think that replacing RAC with DataGuard to get same goal and SLA’s will make you save money?

  6. prodlife says:

    Fritz,
    Thank you!!! That was one of the most useful tips I ever got. I found the data I needed and then some more.
    Let me know when you visit San Francisco, because I definitely owe you a beer.

  7. prodlife says:

    Jarniel,

    Of course I benefit from rolling patches.
    Take one node down, install patch, start the node, stop second node, install patch, start second node, repeat on all nodes.

    No matter how long it takes me, the customer has no downtime. Perfect availability!

    Patchsets and upgrades are more painful with RAC.

  8. prodlife says:

    Dan,

    Its good to know that we are not the only company struggling with this decision.

  9. prodlife says:

    Cristian,

    I think it will be cheaper. Remember than a requirement from HA RAC is that one node should be able to carry 100% load, at least for few hours.

    But the main point is that hardware costs are negligible compared to Oracle license.

    RAC license is 20,000$ on top of the EE license. We are using 4 core HP servers with 12-16G memory and the cheapest hard drives HP will sell, and we buy them for around 6000$ each.

    I admit that I still have to crunch the numbers, but I think there is a real possibility that RAC is too expensive.

  10. prodlife says:

    Steeve,

    I’ll miss RAC a lot 🙂

    Being able to take an Oracle server down without second thoughts is a gift for DBAs. We had a hardware crash last night, and I only found out about it in the morning. Boy, I’ll miss that.

    With DataGuard, I’ll still get the phone call at 2am.

    I don’t think we will save on HW costs. Actually, my biggest worry is that we will have to use two production servers and two standby for each three node cluster we take down, which will mean that we won’t be able to reduce costs at all.

    But as I said, I’ll need to run the tests and check the numbers before I will know for sure.

  11. Yes i’ve forgotten to see the RAC Options price that problably is excessively high. We benefit of RAC inclusion with Standard Edition because we have relatevely small installations and we do not use EE features or options. The difference of licensing cost is very high. That means that in our situation RAC is good, also if as said by Mogens probably we don’t really need RAC.
    Thank You and make we know.

  12. Chris says:

    Remember you can automate the failover in dataguard using Fast-Start failover, altough this need a third observer machine in the configuration.
    Also with RAC you may be able to save money by using services, although this would involve running multiple customers on a single cluster and managing the resources for each customer. I always get the impression this is a function of RAC that gets missed the ability to have a 4/8/12 node cluster and run a service on 2 nodes for 90% of the time and ramp up to as many nodes as are available when required gets missed I think mostly because the RAC implimentations I have been involved with are quite small 2-4 node affairs.

  13. prodlife says:

    Chris,
    interesting point, but resource management is very difficult. Politically more than technically difficult.

  14. Suresh K says:

    I guess there is little difference in the way RAC is perceived, if the goal is only HA DG will score better than RAC anytime …but if the purpose is to have a system that needs scalability (I mean ever growing need for computing power) then RAC is the way to go…not a big fan of having multiple small databases part of a single cluster…


Leave a reply to Dan Norris Cancel reply