Streams without Primary KeyPosted: December 31, 2008
We all know that every table should have a primary key. It is one of those database-architecture-101 basics. You need it for data integrity and you need it for cross references between tables. DBAs and developers argue about many things, but I’ve yet to hear anyone argue that primary key is not needed.
Which is why I was very surprised to discover a schema for an application that contained 200 tables without primary keys. Apperantly the application was not very picky and had no problem with duplicate values in these tables in these tables. The tables were tiny, so there were no performance issues. I decided to leave this issue alone, and all was well.
Until we decided to use Oracle Streams to replicate the schema to another DB.
“If possible, each table for which changes are applied by an apply process should have a primary key. When a primary key is not possible, Oracle recommends that each table have a set of columns that can be used as a unique identifier for each row of the table. If the tables that you plan to use in your Oracle Streams environment do not have a primary key or a set of unique columns, then consider altering these tables accordingly.”
And then it says:
“In the absence of substitute key columns, primary key constraints, and unique key constraints, an apply process uses all of the columns in the table as the key columns, excluding LOB,
RAW columns. In this case, you must create an unconditional supplemental log group containing these columns at the source database. Using substitute key columns is preferable when there is no primary key constraint for a table because fewer columns are needed in the row LCR.”
I think the documentation doesn’t make a clear enough statement here. Let me restate what they are trying to say:
- You cannot use Oracle Streams to replicate a table which has duplicate rows. It will not work.
- If you don’t have unique constraint on your table in some form, you will have duplicate rows.
The first sentence is important. Streams replication will not work with duplicates. There is no way around it. Here’s why:
Remember that with stream replications you are taking DML transactions that run on one schema (I’ll ignore DDLs for now as it is irrelevant to this topic) and attempt to run them on another schema. The way it is done is by mining the redo log and converting DML statements into LCRs. DML LCRs are also called ROW LCRs, and for a good reason – each DML statement is converted into one or more LCRs, each changing one row at most.
How does the apply process knows which row? Remember that you cannot use the ROWID, because the change will be applied in a different schema.
If each row is uniquely identified by a set of columns, and if you configured streams correctly by enabling supplemental logging of these columns, then the information in the supplamental logging will be placed in the LCR, and will be used to uniquely identified the row you try to modify.
But if the information in the DML statement and in the supplamental logging is not enough to uniquely identify the row, perhaps because the row is not unique at all, because your application allows duplicates. Then apply will fail and in the apply error queue you will see error ora-01422: exact fetch returns more than requested number of rows. Because apply used all the information it had, expecting it to define a single row, but due to the duplicates it found more than one row. It cannot apply a row LCR to more than one row, so it fails.
Oracle does offer a workaround of sorts – Apply has a parameter called allow_duplicate_rows. Setting it to true makes the apply process update or delete one of the duplicates it found. Just one of them. The original DML updated or deleted all of them.
Lets reiterate – if you have duplicate rows in your original table, the streams replication of the table will have these duplicates as well.
If you try to update or delete these rows in the original table, you will update or delete all of the identical rows. In the replica, however, either apply will fail and you’ll not delete any row, or it will succeed and delete only one of the rows. Depending on the allow_duplicate_rows parameter. In either case, your replicated table is no longer identical to the original. Because Oracle Streams replication does not work with duplicate data.