Streams without Primary Key

We all know that every table should have a primary key. It is one of those database-architecture-101 basics. You need it for data integrity and you need it for cross references between tables. DBAs and developers argue about many things, but I’ve yet to hear anyone argue that primary key is not needed.

Which is why I was very surprised to discover a schema for an application that contained 200 tables without primary keys. Apperantly the application was not very picky and had no problem with duplicate values in these tables in these tables. The tables were tiny, so there were no performance issues. I decided to leave this issue alone, and all was well.

Until we decided to use Oracle Streams to replicate the schema to another DB.

Oracle Documentation says:

“If possible, each table for which changes are applied by an apply process should have a primary key. When a primary key is not possible, Oracle recommends that each table have a set of columns that can be used as a unique identifier for each row of the table. If the tables that you plan to use in your Oracle Streams environment do not have a primary key or a set of unique columns, then consider altering these tables accordingly.”

And then it says:

“In the absence of substitute key columns, primary key constraints, and unique key constraints, an apply process uses all of the columns in the table as the key columns, excluding LOB, LONG, and LONG RAW columns. In this case, you must create an unconditional supplemental log group containing these columns at the source database. Using substitute key columns is preferable when there is no primary key constraint for a table because fewer columns are needed in the row LCR.”

I think the documentation doesn’t make a clear enough statement here. Let me restate what they are trying to say:

  1. You cannot use Oracle Streams to replicate a table which has duplicate rows. It will not work.
  2. If you don’t have unique constraint on your table in some form, you will have duplicate rows.

The first sentence is important. Streams replication will not work with duplicates. There is no way around it. Here’s why:

Remember that with stream replications you are taking DML transactions that run on one schema (I’ll ignore DDLs for now as it is irrelevant to this topic) and attempt to run them on another schema. The way it is done is by mining the redo log and converting DML statements into LCRs. DML LCRs are also called ROW LCRs, and for a good reason – each DML statement is converted into one or more LCRs, each changing one row at most.

How does the apply process knows which row? Remember that you cannot use the ROWID, because the change will be applied in a different schema.

If each row is uniquely identified by a set of columns, and if you configured streams correctly by enabling supplemental logging of these columns, then the information in the supplamental logging will be placed in the LCR, and will be used to uniquely identified the row you try to modify.

But if the information in the DML statement and in the supplamental logging is not enough to uniquely identify the row, perhaps because the row is not unique at all, because your application allows duplicates. Then apply will fail and in the apply error queue you will see error  ora-01422: exact fetch returns more than requested number of rows. Because apply used all the information it had, expecting it to define a single row, but due to the duplicates it found more than one row. It cannot apply a row LCR to more than one row, so it fails.

Oracle does offer a workaround of sorts – Apply has a parameter called allow_duplicate_rows. Setting it to true makes the apply process update or delete one of the duplicates it found. Just one of them. The original DML updated or deleted all of them.

Lets reiterate – if you have duplicate rows in your original table, the streams replication of the table will have these duplicates as well.

If you try to update or delete these rows in the original table, you will update or delete all of the identical rows. In the replica, however, either apply will fail and you’ll not delete any row, or it will succeed and delete only one of the rows. Depending on the allow_duplicate_rows parameter. In either case, your replicated table is no longer identical to the original. Because Oracle Streams replication does not work with duplicate data.

Advertisements

9 Comments on “Streams without Primary Key”

  1. dombrooks says:

    It’s a classic documentationism, isn’t it?
    Why does documentation rarely just make it clear and concise just like that?

  2. Dotan Mazor says:

    It sounds like a lawyer was behind this phrasing: you can’t say that he didn’t tell you everything, but you can’t figure out anything too…

  3. prodlife says:

    Either a lawyer or someone from marketing 🙂

  4. Dotan Mazor says:

    Chen,

    It is weird, isn’t it? the way that marketing people manage to take a simple statement, and convert it to an incomprehensible obscure paragraph.

    Try out this Corporate bullshit generator. If you’re ever in a management meeting, just press the “more bullshit” link, and wait for the good time to use it. If you won’t use it, you can be sure that someone else will…

    Dotan

  5. Glenn Travis says:

    This post may need to be updated. I am running 10.2 EE with Streams. I had a problem with duplicate rows as well.

    I modified the capture processes parameter ‘allow_duplicate_rows’ as you’ve discussed above. This change has resolved all our issues.

    It updates ALL the rows, not just one of them. It will delete them also. I have done tests before and after, and various iterations of insert/update/delete on a table with no keys/indexes and several duplicates. After adding the parameter, both source and target are completely in synch.

    So I would submit that the ‘allow_duplicate_rows’ parameter does, in fact, “Allow Oracle Streams to replicate a table with duplicate rows. It does work.”

    Can you revisit your testing, and see if things have changed for you?

  6. Bob says:

    I believe that the assumption that your article makes is that you are duplicating data from one table to an exact copy somewhere else.

    I have a situation where my source table has duplicates (horrible – I didn’t create it!). However, I should be able to use Streams.

    For our system, we are running all LCRs through stored procedures on the apply side which completely transform the data to a completely different set of tables. So we don’t have to worry about updating a specific row in an exact copy of the source table.

    Now that I’ve said that, let me go and try to get it done before I go home 🙂 If I can do it, then your first bolded statement is incorrect (and scared the bejeezus out of me when I first saw it).

  7. iksa2 says:

    Thanks! This post has helped us a lot.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s