One More Thing Everyone Should Know About QueuesPosted: October 8, 2009
Don’t double the load on a server and expect that response times will be “a little slower”. If the server is above 30-40% capacity, doubling the load is likely to be catastrophic.
Suppose you have two servers, for example a Netapp heads, that are operating at 48% capacity each, and suddenly one of them fails and you have to move all the load to one of them. This means that you have one server at close to 100% capacity. Users will not complain that things are a bit slow, they will complain that nothing is working.
Someone once told me that if I have two RAC nodes running at 100% CPU each, I do not have high availability. The truth is that you stop having high availability long before the 200% CPU point.
Oh, and in case anyone wonders what we did about the overloaded Netapp. We stopped every non-essential process reading from this Netapp. This included snapmirrors, non-production databases, exports, vmwares, and probably more stuff than I know. This moved utilization down to 60% point and life was good (except that we weren’t too comfortable about lack of snapmirrors and exports).