Daylight Saving Time Causes Performance Issues!Posted: April 2, 2010 Filed under: Analysis, optimizer, statistics 10 Comments
Since I’m starting employment with Pythian on Monday, this is the last technical post that is exclusive to this blog. Future technical tips, discoveries and stories will appear on the Pythian blog, and either copied here or linked to from here. BTW. I already made my first post in the Pythian blog today, as the editor of the Log Buffer – and I’m not even an employee yet 🙂
So, lets end this with a bang. Real life story on how Daylight Saving Time caused one query to take twice as long as before! And its not even an April Fools joke 🙂
Three days ago the customer called me and said: Listen, since daylight saving time started, this process that should take two hours is now taking five. My manager gets his morning report late and he is unhappy.
I told him that I’ll be glad to help with his performance issue, but this can’t possibly related to day light saving time. Besides, since the process runs every night but some nights the performance is fine, the problem is probably in the nature of the data that changes from day to day.
In any case, we started working on the problem with our usual methods: Instrumented the PL/SQL of the process so we can know exactly what part is slowing down the process, ran trace, analyzed the results of traces, looks at AWR reports, look at query plans, etc.
Eventually, we found a query, buried within a view, whose explain plan looked completely unreasonable – First of all, it included a bunch of “nested loops” and even “cartesian merge join” between tables that I knew were quite large. Second, those large tables had estimated number of rows of 1.
I checked DBA_TABLES and saw that num_rows was indeed 0, and last analyzed was yesterday at 10pm. This didn’t make any sense – I personally collected statistics for this table yesterday, it had around 1M rows and it was at 3pm.
Why does it show 0 rows at 10pm?
A quick check with the customer confirmed that the process truncates this table and copies data into it before our slow query runs. I assumed that “truncate table” resets the statistics and asked the customer to modify the process to analyze the table after all the data is copied in, but before the slow query runs.
But two questions still bothered me: Does truncate really resets statistics? And why did this problem start occurring after day light saving time?
The answer to the first question is “no”, truncate does not reset statistics:
SQL> analyze table sitescope.test1 compute statistics; Table analyzed. SQL> select table_name,num_rows from dba_tables where owner='SITESCOPE' and table_name='TEST1'; TABLE_NAME NUM_ROWS ------------------------------ ---------- TEST1 4 SQL> truncate table sitescope.test1; Table truncated. SQL> select table_name,num_rows from dba_tables where owner='SITESCOPE' and table_name='TEST1'; TABLE_NAME NUM_ROWS ------------------------------ ---------- TEST1 4
The answer to the second question was found using DBA_TAB_STATS_HISTORY.
Turns out that there is an automated process that collects statistics on this schema every night. Until DST, it ran every night at 9pm, before the process started running – so the table had data and the collected statistics were accurate. After DST, the process started running at 10pm (obviously the system running the process doesn’t know about DST). At 10pm the table was already truncated, but was not loaded yet, so the process collected statistics for a table that was temporarily empty.
Tada! Daylight saving time caused a severe performance problem!
P.S. No indexes were rebuilt while attempting to resolve the issue 🙂
Not a technical comment, just want to say, “Priceless information!” and thanks.
[…] Gwen Shapira has written an article about a good example of a non-trivial performance problem. […]
[…] better and less costly approach, no matter if the performance problem is caused by a change to daylight savings time, someone verbally abusing the SAN, an upgrade of the Oracle Database version, or something […]
[…] } When I read Chen Shapira’s latest whodunit “Daylight Saving Time Causes Performance Issues,” I remembered an installment of “Ask the Oracles” in the August 2007 issue of the NoCOUG […]
P.S. No indexes were rebuilt while attempting to resolve the issue
I simply love the last line. Very well written and thanks for sharing this.
Yeah, 10.2.0.4 dbconsole is actually 10.1.0.6, so standard RMAN backup doesn’t know about DST fix, even though it supposedly is fixed. So backup jobs are running an hour late since the time change, jobs display says the wrong time, but backup reports show time it actually ran, even though system was rebooted last weekend, since job times are set from last run or something. And support is telling me to… never mind. But hour long backups taking two hours is a performance issue.
Appreciate the article, nicely done. Not sure where you’re located, but every 10g instance I’ve worked on (US Eastern, Central and Mountain TZ’s) runs the automatic stats gathering job between 10pm and 6am. BTW, it’s been around since the intro of 10g, 2003 I think? It’s important to be aware of it for cases like the one you just encountered. It’s possible to cancel the job via the 10g scheduler, or stats for a table that undergoes changes during the window can be locked.
Cheers, and best of luck at Pythian!
This is a great post – I have run into the same thing with stats, although due to legacy “everyone who knew that was running on that odd schedule is gone” rather than DST. As for DST, we recently missed a few backup runs because some were scheduled in cron at 2am. But 2am doesn’t exist on the morning of DST changeover. It goes from 01:59 to 03:00. Oops. Definitely caused some casting around and false theories before we figured it out and fixed our scheduling.
I thought everyone knew that truncate table leaves statistics as is?
My approach most of the time is to use a plsql procedure that ( along with some logging ) does the truncate table then deletes table stats ( with the no_invalidate set to false ) and which then starts using dynamic sampling.
Everyone has to learn this at some point 😉