Save a DB – Practice Restore Today

Tom Kyte published his new year resolutions, and his first one is “Practice restore every month”. According to Tom, practicing restore on a regular basis is the best way to avoid making mistakes on restore, which is something DBAs can’t afford. I agree, and I have few tips for restore practicers among you.

  1. Don’t do it on production. OK, this one is trivial. What I mean is that you can’t really practice restore  on your private “fun and games” DB, because it is likely to be very different than the real ones. Less data usually, and probably it is not connected to the Netapp either.
    We practice our recovery on our staging environment which is a copy of our production, but this means we have to schedule our practices to times that no one is using the staging for their own tests. QA databases are also fun to practice on, in previous workplace QA trashed their DB so often that I practiced recovery on an almost weekly basis.
  2. You can design some of the internal procedures so you’ll have to recover all the time. For example, decide that you will create clones of existing DBs by copying files over, without shutting down the DB, and leave out a file or two for the challenge. If you can’t recover from what you made, make sure you understand why and verify that this situation can never ever occur in production. If you (like me) have to clone DBs on a somewhat regular basis, this gives you an opportunity to use the time to practice recovery without having to justify the practice to anyone.
  3. Get your manager to watch the practice. This has two great benefits – one is that you get to practice in realistically stressfull situation of having a manager looking over your shoulder. The other is that when you recover in an emergency, the manager will be somewhat familiar with what you do and will ask less questions when you are trying to think.
  4. By practicing recovery on realistic environments in advance you will have a good idea how long recovery should take. Try to finish each practice with a nice table saying something like  “For each 20G of lost data files, add 30 minutes to recovery, for each 6 hours of redo to apply add an hour to recover”.  Try to get the business stakeholders to sign off these numbers. If they refuse, maybe its time to think of new strategy. Maybe one involving database flashback or dataguard. In any case, when you are in a real recovery situation these numbers  will allow you to give your business a reasonably accurate estimation of how long the downtime is going to last. This is always a huge win for the DBAs and reduces the stress levels significantly.
  5. Recovery is one of these things that are awfully complicated when you memorize a bunch of rules. Do I need the instance up or down? Do I need a backup of the control file? Which redo logs do I need? Do I recover until cancel, until point in time, until now? It  magically becomes very easy when you understand the idea behind those rules. Don’t remember when you need the backup control file, instead take few minutes (hours, days) to understand the idea of SCNs and how they are used in recovery. So, when you practice, make sure you understand why you do each step that you do, don’t just work off a checklist.
  6. Don’t throw away the checklist. Either have a bunch of checklists ready for every scenario (my method) or write a checklist and have someone review it before starting recovery (senior DBA does that). Always work with a list. You don’t want to risk forgeting any step.
  7. Don’t forget to verify the production backups themselves on a very regular basis. Nothing is worse than discovering that for some reason your backup is useless. I try to always keep a recent full  export on the side of such emergencies. I don’t think we ever had to use it, but we came very close once or twice and it was a comforting thought to know that if we can do nothing else, at least the export is there.
About these ads

7 Comments on “Save a DB – Practice Restore Today”

  1. Hi Chen,
    in point 7 you say “i try to keep a recent full export…” this makes me answer you: have you ever tried to import a full export? how does it work? i’ve tried but i’ve never completely understand how i can import a full export, have i to create a database without create data dictionary? If i create a new db and then i try to import a full export i receive errors on existent object (sys and system) … how do you do?

  2. Practicing restore is much more fun if you do it in teams, one team does the most perfide corruption and the other team try to recover. The team that failed recovering from any disaster is fired. A bit like in the OCM practicum :mrgreen:

  3. prodlife says:

    Hi Cristian,

    We never had to do it during an incident, but the way we plan to do it is to first create a new database (with data dictionary) and then we import the user schema (but not sys and system). We do lose things like AWR snapshots and session history, but if you got to the point that you recover from export, this is the least of your worries.
    I suspect that you can also import sys and system and ignore the errors on existing objects, but I never did that.

  4. prodlife says:

    Laurent,
    Its a good idea, and mentioned often, but I never really saw it working in practice. Did you work in a place that did this?

  5. [...] her post Save a DB – Practice Restore Today, Chen Shapira makes note of Tom’s New Year Resolutions, particularly, “Practice restore [...]

  6. Chen,
    your post is precious, it resume well how the dba should act after data loss!
    about export/import, when you try to recover from a full export you must pay attention to some things:
    - you should have db creation scripts and pfiles to reduce your time-to-recover
    - many options should be reinstalled properly on target db
    - doing partial imports you loose public database links and public synonyms, roles, ecc.
    - take care about charsets
    - take care about jobs and schedules
    - check for release bugs about sequences numbering
    - take care about blobs and commit=Y (if I remember well imp commits every row that has a blob, so you’ll wait long time on log file sync)
    - fromuser-touser import does not recreate empty schemas, schema definitions, tablespaces (see note 10767.1).
    to have the whole definitions you must do a full import.
    expdb/impdp are much better, faster and flexible but actually they fail with on tables with xmltype.
    another good practice is always check your dump files with imp show=Y to prevent unreadable dump files.

  7. prodlife says:

    Ludovico,

    Thanks for the great tips :)


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 3,110 other followers