Meaningless Foreign Keys

Posted: September 5, 2007 | Author: prodlife | Filed under: design, plsql |7 Comments

Jonathan Lewis wrote an article about Meaningless Keys. It’s an old article, but I just read it today and I found it very inspiring. So immediately I started thinking about my latest project.

For the project, I created a customers table. Each customer has a unique name, but it also has a meaningless ID, generated as sequence, and this meaningless id is used as a foreign key in other tables.

Why do I have this meaningless ID? Well, by now it was a habit, but I can still remember the reason – you don’t want to duplicate the same piece of data in multiple tables. Thats a known best practice for developers. But why? Because you don’t want to waste space – but in this case it is just varchar(50) column in a table that will never reach 100,000 rows, the space is not the issue. Also because if we will want to change the customer name we will have to change it in many places. We will have to hunt down its usage in all the child tables and update them as well. Thats the real bummer – it will complicate the application and make it more bug prone.

So, maybe Jonathan Lewis got it wrong? Or maybe I got him wrong? Maybe my meaningless key is not such a bad idea after all? Too bad, it would have been nice using the customer name as the key and avoiding all these join operations. If I could just solve the update problem. Why can’t I create a foreign key constraint with “on update cascade”?

Actually, I can.
Oracle didn’t provide this feature as built-in, but Tom Kyte wrote an update cascade package that does exactly what I just describe – it updates all the child references when you update a key on the parent table. I used it, it works great.
I would have given an example here, but this is Tom Kyte we are talking about, and of course he gave a great example that is much clearer than what I can put here. So just go check it out.

7 Comments on “Meaningless Foreign Keys”

dombrooks says:

September 5, 2007 at 7:54 am

I think you’ll find that Jonathan’s is not a sweeping generalisation, just that there are sometimes advantages and disadvantages.

My interpretation of that article is that sometimes when you’re using “meaningless keys” you have to go a long way just to do a lookup on the meaningless key value (as opposed to having to join tables to get other attributes of that value) and that can have performance implications.

In the case that he illustrates, area code was probably a candidate for being the primary key of table1 and could have been used in table4 as the value in the foreign key. Which would have made it nice and easy when looking at table4 for rows with an area_code of ‘EMEA’. But because it wasn’t, you couldn’t and had to go through table2 and table3 to map ‘EMEA’ to the meaningless key on table4.

You mention a customers table and their unique name as a candidate for the primary key. Remember that the value of primary key should be immutable – never changing. If the customer might change their name, do not use this. That should immediately rule this out as a candidate PK.

My advice – use a meaningless key where appropriate, use a natural and immutable key where appropriate.

Reply
Noons says:

September 5, 2007 at 8:28 am

the problem of course is defining “appropriate”.

What I’ve found is in any system where there is the possibility of incorrect data store – be it because of incorrect upfront verification, programming error, operator error, whatever – to use such data as a natural key is to ask for trouble in the long run.

Yes, in some cases it might make more sense to use natural keys. Yet, the disadvantages outweigh the advantages every single time.

So I tend to use synthetic keys as foreign keys, most of the time.

Reply
Keys « OraStory says:

September 5, 2007 at 8:49 am

[…] 5th, 2007 · No Comments I wrote a comment today on an article on meaningless keys on Chen Shapira’s blog, a comment that I subsequently felt was worth regurgitating in a blog […]

Reply
joel garry says:

September 5, 2007 at 1:58 pm

What’s my wife’s name?

In some cases it has nothing to do with my name, since she uses her maiden name professionally. In others, she shares my surname. Which means in some cases she had to change it. And her maiden name is like some imprisoned revolutionary, and her married name is like my cousin’s maiden name…

There have been some very interesting variants on her name, especially with credit check issues. Some of them have even propagated to my name in marketing materials.

In my day-to-day work, I’m always having to deal with customers’s customers buying each other, customers splitting out divisions and renaming them, and so forth.

I can only conclude that it is never appropriate to use a name as a primary key.

And of course, enterprise apps are going to violate the constraints on Tom’s package, at least sometimes, if not throughout.

Great blog, by the way.

Reply
prodlife says:

September 5, 2007 at 11:35 pm

Thanks for all the comments.

I completely agree that Jonathan didn’t mean that we need to always avoid the meaningless keys. It was my interpretation – I took his idea and started considering why we can’t take it too extremes.

All of the comments stress the idea of “immutability”, and I think this is the key issue here (unintentional pun). You can only use a meaningful primary key if you know it is immutable, and this will be an application requirement – which means that it will change three times a week (agile developers embrace change!). In our case, customer name is really immutable, we have a customer_display_name column which the customer can change if he wants.

Reply
joel garry says:

September 6, 2007 at 1:43 pm

A vendor of one of my customers has something like that. So they give me a login that is part my name and part my customers name. So my customer splits off a division into a separate company with the original name, and changes its name, keeps all software. This is not made up.

The vendor sells what used to be called RAD software (Rapid Application Development, Agile’s arthritic old grandpa).

Reply
Tony Johnson says:

September 7, 2007 at 4:58 pm

Like everything else about datbase design … “it depends”. Sometimes meaningless keys are the “Holy Grail” for your design and other times they can kill an application.

Reply