Keeping a tree in your relational DBPosted: August 2, 2007
One of the questions that our developers face again and again is how to keep hierarchical data in a relational database. Developers always run into this question because trees are one of the basic data structure developers use and like all data structures, sometimes they need to be serialized to the disk for persistence in case the server crashes or between restarts. Unlike other data structures, trees simply don’t map well to the relational database. The models are too different, and no matter which strategy you choose you will run into problems. Here are some solutions you can try:
1. Don’t use the DB at all. Keep the tree in a separate file. Thats the first solution I recommended to our developers – the DB is not a good place for trees, try using something else. This solution failed miserably when we needed to recovery from backups and discovered that data in the tree and in the DB were now inconsistent because we couldn’t recover the DB to the exact same time the file was backed up.
2. Don’t use a relational DB. Our developers chose OpenLDAP as a good place to keep trees. It is really a good solution from the development point of view. I’m not as happy because now I have to manage both Oracle and OpenLDAP, and I don’t like OpenLDAP all that much.
3. Keep the tree as a huge XML inside the DB. Very similar to keeping the data in a file, but without the recovery and consistency issues. Just don’t expect the DBA to help you if you need to get some data from inside the XML. Although Oracle now has built in support for XML, not all DBAs are fluent with the new creature.
4. In each row, have a column with the ID of the parent of this row. Kind of like pointers in the database. Use this and you are likely to run into a very painful situation where you need to use unbelievable number of self joins for any simple operation. IO usage will probably become an issue very quickly. You can also forget about having any usable index there.
5. In each row, have a column with a string that encodes the level in the tree that this row belongs to, and the ID of the parent. I’m not sure if this is similar to the previous idea or even worse.
6. According to Microsoft technology preview SQL Server 2008 should have a built in hierarchical data type. I think it just makes it easier for developers to make the wrong decision, but maybe Microsoft will surprise us all and the new data type will be both easy to use in queries and have decent performance.