I'm just a simple DBA on a complex production system

Writing about all things production. Especially Oracle databases.

The Small Difference Between XML and RDBMS December 14, 2008

Filed under: design, musing — prodlife @ 6:32 am

OracleNerd posted a rather heated discussion between himself and an application developer, and few people jumped in to comment.  I can’t recommend reading this discussion, it is full of unsupported assertions and rude language. Something that was mentioned in it, did catch my attention.

MCohen wrote “So for example, it shouldn’t really matter to me that we’re using an Oracle database over say, an object database or xml files, or Amazon S3. We’re not quite there yet, but we’re making progress.”

I worked on few projects that stored data in XML files. Usually it starts rather innocently. The program needs to store and retrieve few configuration parameters, and using a database for that seems like an overkill. So they put it in an XML file.

As the project grows, more configuration parameters are added. Some of them may be depended on each other, so if you update one and fail to update the other, you need to undo your first update. Congratulations, you just reinvented transactions.

Few weeks later the project have several components and maybe even threads. Sometimes, several of those need to access the data at once. Maybe you serialize all the access to the data, completely locking the file while it is updated. Maybe you want to allow reads while an update is going on, but how will you prevent reading partially updated data? Different RDBMS has different solutions for concurrency, but Java’s XML library will let you write your own. Enjoy.

I’m not saying XML is always a bad idea. Sometimes, you need your data store to be human-readable. Sometimes, your data is just not very well structured, and would not fit in an RDBMS. But using XML has a cost, and you should make sure it is worth it. To make the right decision, you should know what is the difference between an XML file and an RDBMS.

If you don’t know whether your data is in XML or in RDBMS, how will you know whether or not you need to implement your own transactions?