Collecting Requirements for Disaster Recovery

When an earthquake wipes out your datacenter, it may be too late to do anything about it. Obviously you need to plan for such disasters in advance. Every IT organization, big or small, needs such plan. I hope your organization already has one plan, and that you test it on regular basis. But sometimes you need to plan for disasters from almost scratch. Maybe because your business never had a disaster recovery plan, or maybe because availability requirements have suddenly changed and the previous plan is insufficient.

So, How do you start writing your disaster recovery plan?

If you are an Oracle DBA, you may be tempted to start by configuring DataGuard. If you are a sys admin, you may be ordering additional machines and calling various ISPs. If you are a storage manager, you’ll probably pull out your vendor’s favorite remote mirroring solution. If you are sales/marketing, you probably already promised 99.99999% availability.

Don’t do any of that. You start by asking questions. Here are the questions we thought of a bit too late this time around, but next time we’ll ask before we even begin to discuss the right technology:

  1. What is acceptable time to recovery? Can we just ship the tapes somewhere, or do I need hot standby?
  2. How much data loss is acceptable? Can we recover from last nights backup, or do we need data from 5 minutes ago?
  3. How much performance degradation is acceptable during a disaster? For how long? Can I save a bit on the extra hardware?
  4. How much redo logs are generated per day? i.e. what is the rate of data changes that we need to support now?
  5. What is the expected data growth for this DB/App for the next year? How much will we need to scale our solution?
  6. How will clients access the system in case of disaster? Do we need to migrate IPs or can you use new ones?
  7. How often do we need to validate the DR site? Testing every quarter, 6 month, once a year?
  8. When does the DR need to be in place?
  9. How much of a downtime will be acceptable for returning back to the main site? How much in advance do we need to schedule it?
  10. Who decides that it is now a disaster and failover to alternate site (or backups) should occur? What are the criteria for the decision?

From my experience, the fewer questions you ask, and the simpler the questions are, the more likely you are to get good answers. And with good answers, you can choose your technologies, implement, test, rinse, repeat.


6 Comments on “Collecting Requirements for Disaster Recovery”

  1. Freek says:

    Another question you might ask is: What is the budget?
    No use in designing a porsche when you have only the money for a lada.

  2. mdinh says:

    Something to consider is Business Continuity.

    If there is a major earthquake in CA, will you be more concern about recovering data or about your family?

  3. Chen Shapira says:


    Good point, thanks.

    I’ll be very busy with my family, water, etc.
    The DBAs from our EMEA team on the other hand will happily handle the failover to our DR site and continue running the business 🙂

    Now simultaneous earthquakes in CA and in UK can be a problem…

  4. […] Collecting Requirements for Disaster RecoveryWhat is acceptable time to recovery? Can we just ship the tapes somewhere, or do I need hot standby? How much data loss is acceptable? Can we recover from last nights backup, or do we need data from 5 minutes ago? … […]

  5. Chris says:

    One question I have seen get missed is:-

    where will the documentation be kept and who is responsible for keeping it up to date?

    Keeping your DR implimentation plans in the same site as you data centre can make it a little tricky when a failover occurs.

  6. Asif Momen says:

    I think Chris has come up with a very good point which is mostly ignored.

    We had a total power failure on our HQ and the management decided to failover to the DR site. But unfortunately, all the documentation/scripts were residing in HQ. We were lucky that God answered our prayers and the power was restored within minutes.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s