Keeping a tree in your relational DB

One of the questions that our developers face again and again is how to keep hierarchical data in a relational database. Developers always run into this question because trees are one of the basic data structure developers use and like all data structures, sometimes they need to be serialized to the disk for persistence in case the server crashes or between restarts. Unlike other data structures, trees simply don’t map well to the relational database. The models are too different, and no matter which strategy you choose you will run into problems. Here are some solutions you can try:

1. Don’t use the DB at all. Keep the tree in a separate file. Thats the first solution I recommended to our developers – the DB is not a good place for trees, try using something else. This solution failed miserably when we needed to recovery from backups and discovered that data in the tree and in the DB were now inconsistent because we couldn’t recover the DB to the exact same time the file was backed up.

2. Don’t use a relational DB. Our developers chose OpenLDAP as a good place to keep trees. It is really a good solution from the development point of view. I’m not as happy because now I have to manage both Oracle and OpenLDAP, and I don’t like OpenLDAP all that much.

3. Keep the tree as a huge XML inside the DB. Very similar to keeping the data in a file, but without the recovery and consistency issues. Just don’t expect the DBA to help you if you need to get some data from inside the XML. Although Oracle now has built in support for XML, not all DBAs are fluent with the new creature.

4. In each row, have a column with the ID of the parent of this row. Kind of like pointers in the database. Use this and you are likely to run into a very painful situation where you need to use unbelievable number of self joins for any simple operation. IO usage will probably become an issue very quickly. You can also forget about having any usable index there.

5. In each row, have a column with a string that encodes the level in the tree that this row belongs to, and the ID of the parent. I’m not sure if this is similar to the previous idea or even worse.

6. According to Microsoft technology preview SQL Server 2008 should have a built in hierarchical data type. I think it just makes it easier for developers to make the wrong decision, but maybe Microsoft will surprise us all and the new data type will be both easy to use in queries and have decent performance.

Advertisements

4 Comments on “Keeping a tree in your relational DB”

  1. With my experience I totally disagree with -> “Unlike other data structures, trees simply don’t map well to the relational database.” of course for Oracle 🙂

    Please choose your release from http://tahiti.oracle.com and search “Hierarchical Query” – http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/queries003.htm#i2060615

    Also as always psoug may help with its example demos – http://psoug.org/reference/connectby.html

    You may test and see the results yourself.

    Best regards.

  2. dombrooks says:

    I’m surprised that you think this is a problem storing this in a database, Oracle at least.

    >You need to use unbelievable number of self joins for any simple operation
    Not in Oracle using “connect by”

  3. Oliver says:

    If you use 4, then the sql statement
    select .. connect by prior / connect by root
    comes in very handy.
    You don’t need to self-join the tables and the execution plan looks much better !

    I would recommend this to all other posted solutions, but beware of a naasty bug in 10.2.0.2 where a connect by can result in ORA-0600

  4. prodlife says:

    Wow, looks like I missed the elephant in the living room.

    I ran into queries using “Connect by” in the past while browsing “Ask Tom”, but I always filed it in the should-check-this-later drawer together with analytics and never got around to actually learning to use it.
    Obviously, I should have.

    So, it looks like SQL Server is simply catching up with a feature that Oracle already had for a while and I simply missed all the action.

    Thanks for the tips and links.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s