A Year with StreamsPosted: March 3, 2009
Whenever Tom Kyte is asked about replication, he has a default answer: Don’t.
Well, last month marked our first year anniversary of our use of streams based replication to create a reporting DB.
In many ways, I’m very proud of this solution. Our reporting DB allows our software-as-a-service to offer very flexible integrations with all kinds of reporting systems without having us worry about impact on our production environment. Our senior DBA designed a terrific architecture that allows for relatively pain-free maintenance on the system. We have several customers using this solution and it seems to be making some profit.
For a year anniversary, we celebrate by announcing a formal SLA for the service. SLA discussions are always interesting. We are looking at 3 important components:
- Setup times – How long after the customer requested this service do we expect to deliver it?
- Availability – How many hours a month do we expect the service to be down?
- Time to replication – A change was made on production. How soon can we show it on the reporting DB?
Setup times are our most volatile component – we range between 15 minutes when everything goes well, to around 2 month when we run into an Oracle bug, Oracle offers no work-around and patch after patch fails to solve the problem.
With that kind of variance, offering any kind of SLA seems a bit futile. Next week I’m at HotSos, Maybe Robyn Sands has the solution. BTW. I find it incredible that until last month I’ve never heard of Robyn Sands, and now not only I hear about her all over the place, the more I hear about her, the more I like her. A lady who’s an Oracle pro, knows her stats and wants to stop root cause madness. I can’t wait to meet her.
Back to the point, we don’t see how we can offer an SLA on the setup time, which is bad. What about the availability after the service is up? We are doing much better here, in the sense that downtimes are predictable. The problem is that we have many of them. We have mandatory 2 hours a month due to an Oracle bug that requires us to rebuild the environment from scratch every month. We also have an additional monthly downtime for applying the various patches we need in order to solve other bugs.
In short, Streams is still buggy. So even if you design your replication to be very safe, put primary keys on all your tables, and get everything else correct, you still have trouble maintaining high availability and predictable setup times. This is for a product that has been out since 9.*
What about replication times? Here, I feel like the guy who jumped off the 50th floor window. Somewhere around floor 30 someone else looks out his window and asks him “hey! how are you doing?” and he replies “So far, so good!”. So far, we’ve been able to resolve all the performance problems we had with replication (#1 tip – don’t use “like” in replication rules).
With all the troubles we are having with Streams, would I use this solution again? I think that given all the constraints we had to work with, replicating data to a reporting DB was the best solution.
I would have preferred a different solution. I would have liked our product to have a strong, consistent, web services API that would allow integration with external systems without any need to a direct access to the database. Unfortunately, the web services API arrived late, and we have to support versions that don’t have it yet. Also, integrating with web services API seems to require more work than database level integration and business was very eager to offer an easier solution.
Another solution that might have been preferable (although I’m less sure), is a reporting utility where we could load customized queries for our customers, run them as a low priority process on production and send the customers the results. The problem with this solution is that it may require more work than Streams (keeping an eye on all those custom code running on production), and guaranteeing response times for the reports would have been an issue. With the reporting database we somehow managed not to have any guarantee regarding performance of the reports.
To sum things up, when streams is good it is very very good, and when it is bad, it is still somewhat better than the alternatives. At least it gives me lots of blog material.