Since I’m starting employment with Pythian on Monday, this is the last technical post that is exclusive to this blog. Future technical tips, discoveries and stories will appear on the Pythian blog, and either copied here or linked to from here. BTW. I already made my first post in the Pythian blog today, as the editor of the Log Buffer – and I’m not even an employee yet 🙂
So, lets end this with a bang. Real life story on how Daylight Saving Time caused one query to take twice as long as before! And its not even an April Fools joke 🙂
Three days ago the customer called me and said: Listen, since daylight saving time started, this process that should take two hours is now taking five. My manager gets his morning report late and he is unhappy.
I told him that I’ll be glad to help with his performance issue, but this can’t possibly related to day light saving time. Besides, since the process runs every night but some nights the performance is fine, the problem is probably in the nature of the data that changes from day to day.
In any case, we started working on the problem with our usual methods: Instrumented the PL/SQL of the process so we can know exactly what part is slowing down the process, ran trace, analyzed the results of traces, looks at AWR reports, look at query plans, etc.
Eventually, we found a query, buried within a view, whose explain plan looked completely unreasonable – First of all, it included a bunch of “nested loops” and even “cartesian merge join” between tables that I knew were quite large. Second, those large tables had estimated number of rows of 1.
I checked DBA_TABLES and saw that num_rows was indeed 0, and last analyzed was yesterday at 10pm. This didn’t make any sense – I personally collected statistics for this table yesterday, it had around 1M rows and it was at 3pm.
Why does it show 0 rows at 10pm?
A quick check with the customer confirmed that the process truncates this table and copies data into it before our slow query runs. I assumed that “truncate table” resets the statistics and asked the customer to modify the process to analyze the table after all the data is copied in, but before the slow query runs.
But two questions still bothered me: Does truncate really resets statistics? And why did this problem start occurring after day light saving time?
The answer to the first question is “no”, truncate does not reset statistics:
SQL> analyze table sitescope.test1 compute statistics; Table analyzed. SQL> select table_name,num_rows from dba_tables where owner='SITESCOPE' and table_name='TEST1'; TABLE_NAME NUM_ROWS ------------------------------ ---------- TEST1 4 SQL> truncate table sitescope.test1; Table truncated. SQL> select table_name,num_rows from dba_tables where owner='SITESCOPE' and table_name='TEST1'; TABLE_NAME NUM_ROWS ------------------------------ ---------- TEST1 4
The answer to the second question was found using DBA_TAB_STATS_HISTORY.
Turns out that there is an automated process that collects statistics on this schema every night. Until DST, it ran every night at 9pm, before the process started running – so the table had data and the collected statistics were accurate. After DST, the process started running at 10pm (obviously the system running the process doesn’t know about DST). At 10pm the table was already truncated, but was not loaded yet, so the process collected statistics for a table that was temporarily empty.
Tada! Daylight saving time caused a severe performance problem!
P.S. No indexes were rebuilt while attempting to resolve the issue 🙂
Oracle’s Concepts book says the following about the characteristics of Data Warehouses:
“A typical data warehouse query scans thousands or millions of rows.For example, “Find the total sales for all customers last month.””
They also say:
“Data warehouses are designed to accommodate ad hoc queries. You might not know the workload of your data warehouse in advance, so a data warehouse should be optimized to perform well for a wide variety of possible query operations.”
So we have multiple ad-hoc queries, each with its own plan, unaware of other similar queries that may be running at the same time, each reading millions of rows. Obviously they compete for resources. What if several concurrent queries are full-scanning the same table? What if this table is too large to fit entirely into memory?
It sounds obvious that in this case, all the separate execution threads will read their own blocks into memory, often “aging out” blocks that will be needed in just few seconds by another query. All competing for buffer-cache latches and blocks. The more queries we have concurrently on the system, the slower response times will be, due to competition on limited memory and IO resources.
Jonathan Lewis famously said: “The performance of a query should be related to the size of the data set you’re interested in, not to the size of the database”
I would like to add “… or to the number of concurrent users in the system”, but this is obviously untrue. Resource contention when the number of users rises has dramatic negative impact on performance. This is why we run load tests before putting new systems in production. We know that good performance with use in the lab does not guarantee good performance with 40 users in production.
But what if it doesn’t have to be this way? What if your database could, instead of optimizing each query seperately, could optimize the entire workload? So that with one or 40 or 256 users in the system we will still see very similar response times? What if the queries could share resources instead of competing for them?
All this is a rather lenghthy introduction to a cool idea I’ve ran into when scanning the program of the upcoming VLDB conference.
The paper I’ve read is called “A Scalable, Predictable Join Operator for Highly Concurrent Data Warehouses” and it is by George Candea, Neoklis Polyzotis and Radek Vingralek.
In the paper, the authors introduce CJOIN – a physical operator (i.e. an implementation of a relational operator) that can evaluate concurrent joins efficiently. It was written to allow sharing of CPU and IO resources and to fit modern DW systems – star schema, multiple cores, fast sequential scans and large memory.
The idea behind the CJOIN design is that there is a single physical plan that is “always on” and is optimized based on run-time statistics. Each new query can use this plan at any time and start sharing work with concurrent queries in the same plan.
Since all this sounds a bit vague, let me present the example that is given in the paper to demonstrate the idea. Then I’ll mention few of the more interesting points that are detailed in the paper, and hopefully after reading my descriptions you’ll decide to read the paper (which requires some effort on the part of the reader).
The design on CJOIN is based on a simple observation: Star queries all work by filtering a fact table through dimension tables and aggregating the results.
CJOIN works as a pipeline – receiving the input from a continuous scan of the fact table, passing the data through a set of filters, one for each dimension table, and distributing the results to aggregation operators that produce the output for *all the queries* using the operator.
Since the scan of the fact table is continuous, a query can start using the operator at any time, by remembering the point it registered and completing when the scan reaches this point again.
Suppose we have a fact table “Sales” with dimension tables “Customers” and “Products”.
Lets imagine the following two queries running concurrently:
Select sum(quantity) from sales, customers, products
where sales.customer_id=customer.customer_id and sales.product_id=products.product_id
and customers.city=’Japan’ and products.type=’High-Price’;
Select avg(dollar) from sales, customers, products
where sales.customer_id=customer.customer_id and sales.product_id=products.product_id
and customers.service_level=’Gold’ and products.type=’High-Price’;
As you can see, they share the same data source, but apply different filters and predicates.
Here’s how the CJOIN pipe will work:
The pipeline starts with a pre-processor, which receives rows from the continuous scan of the fact table and forwards them to the filtering part of the pipeline. Before doing so, the pre-processor adds few bits to each row – one bit for every query that is registered on the pipeline (i.e. queries that are interested in rows from this fact table). All the bits start out as “1”, signifying that at this stage all queries are interested in every row.
Now lets take a look at the filters:
We have a filter for each dimension table. The filter is a hash table that stores all the rows of that dimension that are of interest to any of our queries. Remember that while the fact table is too big to fit into memory, dimension tables are typically small enough to fit the memory of a nice DW server. Like the fact rows, the filter rows also have an additional byte per query.
So in our example, the “customers” filter will contain all the customers from Japan and customers with service_level “Gold”. The rows for customer from Japan will be have the first bit turned on and the second turned off, the row for Gold customers will have the reverse, because only the first query checks for customers from Japan and only the second checks for Gold customers. Products filter will contain the products of type “High Price” and both bits will be on, as both queries check for High Price products.
Note that when we start a new query, we need to add the appropriate rows from the dimension tables to the filters and remove them when the query is finished running. This is relatively quick because dimension tables are relatively small.
Now a row from the fact table arrives at the Customers filter. We will quickly check the customer_id on this row and see if it matches any row in the filter. If it exists, we know that at least one query wants this fact row. We can then check the query bits in the matching filter row to see which query needs it. If we see that only query 1 needs this fact row, then this row no longer interests query 2 and we can mark the second bit of the fact row as 0. If all query bits are marked as 0 , we can throw the fact row away. No one will need it.
In this way the row from the fact table passes through all the filters and arrives at the distributor. The distributor recieves fact rows that are relevant for at least one query in the current work load. It checks the bits to see which queries are interested in this row and sends it to the aggregators for these queries.
Once you got this example, you should be able to enjoy the paper. The paper actually contains this example, but with D1 instead of customers and d21 instead of Gold. I’m just a simple DBA and I understand better with a more concrete example.
You probably want to read the paper because it contains the algorithms for adding and removing queries from the pipe, so you’ll be convinced of how fast and clever this is.
The paper also contains a discussion of how to best parallelize this pipeline. Parallelization is very important in DW, and the paper offers several ideas and picks the best. It also has some ideas on how to handle concurrent updates to the tables, and ideas of how to adapt the CJOIN to other models except star schema.
Finally, the authors of the paper implemented their idea on top of PostgreSQL database, and they have extensive analysis of how the CJOIN indeed improve performance for concurrent workload (They seem to achieve almost linear growth in throughput as the number of concurrent queries grow!).
I hope you enjoyed this peek into the future of DBMS as much as I did and I hope to see CJOIN in Oracle soon 🙂
Last year, I wrote a bit about table shrinking. It generated quite a bit of discussion and both Freek d’Hooge and Venz warned about some of the side effects involved, especially when using “enable row movement”.
I just discovered another effect of row movement. As the name implies, after enabling row movement, rows get moved around. This can impact the clustering factor of your primary key.
If you had an index on a monotonously growing column (such as “create date”), the table and the index would be ordered in the same way. This results in a low clustering factor, and it means that the index is considered more favorably by the optimizer for some operations. I’m not a big expert on the optimizer, but I assume that if the index and table are ordered in the same way, and you need to scan a portion of the table, the index can be used to find the first row and then the table itself can be scanned. This method should be about twice faster than scanning the index and going to the table for each row to get extra columns that are not in the index. (Again, this is common sense only, I did not test or even deeply studied this).
Anyway, so you have this nicely ordered table, and a nicely ordered index to match, and you probably enjoy all kinds of performance benefits from this arrangement.
Now you enable row movement and shrink. Rows get moved around, and the order of rows in the table no longer matches the order of rows in the index. So you have a smaller table (less blocks) and a higher clustering factor. Maybe scanning portions of the table will be faster since the table is smaller and you read fewer blocks, but maybe it’ll take longer because the ordered changed and the index is less usefull now. In any case, this is something you should consider before deciding to shrink the table with row movement – it can have unexpected effect on your plans.
Here’s a small test I ran to verify:
SQL> create table test (id number); Table created. SQL> INSERT INTO test SELECT rownum FROM dual CONNECT BY LEVEL <= 10000; 10000 rows created. SQL> commit; Commit complete. SQL> create index test_i on test (id); Index created. SQL> EXEC dbms_stats.gather_table_stats(ownname=>'SCOTT', tabname=>'TEST', estimate_percent=> null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed. SQL> select t.table_name,i.index_name,t.blocks,t.num_rows,i.clustering_factor from user_tables t, user_indexes i where t.table_name=i.table_name and i.index_name='TEST_I'; TABLE_NAME INDEX_NAME BLOCKS NUM_ROWS CLUSTERING_FACTOR ------------------------------ ------------------------------ ---------- ---------- ----------------- TEST TEST_I 20 10000 16 SQL> delete from test where mod(id,2)=0; 5000 rows deleted. SQL> commit; Commit complete. SQL> EXEC dbms_stats.gather_table_stats(ownname=>'SCOTT', tabname=>'TEST', estimate_percent=> null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed. SQL> select t.table_name,i.index_name,t.blocks,t.num_rows,i.clustering_factor from user_tables t, user_indexes i where t.table_name=i.table_name and i.index_name='TEST_I'; TABLE_NAME INDEX_NAME BLOCKS NUM_ROWS CLUSTERING_FACTOR ------------------------------ ------------------------------ ---------- ---------- ----------------- TEST TEST_I 20 5000 16 SQL> alter table test enable row movement ; Table altered. SQL> alter table test shrink space cascade; Table altered. SQL> alter table test disable row movement ; Table altered. SQL> EXEC dbms_stats.gather_table_stats(ownname=>'SCOTT', tabname=>'TEST', estimate_percent=> null, cascade=> true, method_opt=>'FOR ALL COLUMNS SIZE 1'); PL/SQL procedure successfully completed. TABLE_NAME INDEX_NAME BLOCKS NUM_ROWS CLUSTERING_FACTOR ------------------------------ ------------------------------ ---------- ---------- ----------------- TEST TEST_I 8 5000 22
Tonight, at midnight, inserts slowed to a crawl at one of our main production databases. This database is used to process about 300 inserts every second, but for about 20 minutes starting at midnight, each insert took almost a minute. The problem resolved by itself, but now the application owner wants answers – what caused the slowdown at midnight?
A co-worker ran AWR and found that the top wait event is Enq: TM-Contention. A lock! This looks promising, but inserts usually don’t lock. Thats one of the most important principles of Oracle – readers don’t block writers and writers don’t block readers. The most locking I’d expect from insert will be a latch contention on a hot block. But co-worker is a very experienced DBA and he knew that TM-Contention can happen from non-indexed foreign key, and he started looking for one.
At this point I arrived at the office, and he updated me on the situation.
I don’t like AWR at all, so I took out my favorite tool – Confio. This is a nice graphical tool that helps with wait event analysis, but you can probably get pretty much the same data by digging into v$active_session_history yourself.
Confio also showed that TM-Contention is the largest wait event, but it also showed the queries waiting (lock table … in exclusive mode) and the machines. The table locked and the machines had nothing to do with the insert issue. A completely different part of the application. So, we can skip this event. I don’t think you can get this kind of information from AWR!
The second and third highest events are “db file sequential read” and “gc current request”. These events are so typical of selects (usually with a bad index), that if I saw them in AWR I’d probably assume they are unrelated, but Confio showed, clear as a day, that all the queries with this wait are inserts. I’m not talking about a fancy “insert into … select”, I’m talking about a plain “insert into … values”.
I still don’t know why a plain insert had to wait on “db file sequential read”, and why it had to do it only at midnight and stopped that 20 minutes later. But at least now I know where to look!
Note how AWR not only lead us in the wrong direction, it also doesn’t give enough information to follow up on the issue – like queries, tables and machines involved.
Jeff Smith, my favorite SQL Server blogger, wrote a great post about the importance of data types in database design.
Jeff is absolutely right – developers have absolutely no excuse for not getting data type corret. If you let an Excel script kiddie design your database, you deserve to be screwed.
If you are running Oracle, instead of SQL Server, the issue of data types is even more important than you think. It is not just a matter of getting the sort wrong, taking more space than you should, or being more vulnerable to typos. It is a matter of Oracle getting the execution plan right and getting you the data in an efficient way.
Here’s a nice example:
A bug tracking application was designed to support multiple databases. Since different databases handle dates very differently, the developers took a shortcut and decided to use numeric field for the date, use something like ‘20070301’ to signify that the bug was closed at March 1st, 2007, and write tons of application code to handle all kinds of trivial date operations. Note that they could have used Unix time, seconds since 1970, and avoid all the pain I describe here, but Unix time is less human readable. The application also supported generating reports. Number of bugs opened last week, two month ago, etc.
Suppose we have two years of bugs in our database. Thats 730 different possible closing dates. But if they are kept as numbers from 20060101 to 20071231, the optimizer will be convinced that there are 11130 different possible dates.
First scenario: We want to generate a report from January 1 to June 1 2007.
When we use real dates that means a quarter of our date range and the optimizer will correctly decide that it is faster to go do a table scan than use an index.
When you use numbers, 20070601-20070101 = 500. That is 4% of the data and indexes will be used to retrieve 25% of your data. Go grab a cup of coffee and maybe a nice book (“Cost Based Oracle Fundamentals” will be a good choice), it will be a while before you see the results of that report.
Second scenario: Suppose you want just one week of data. Say, March 1 to March 7. With both data types this is a very small percent of the data and an index will be used.
But what if you want to see the week from December 28 to January 3rd? If you use dates, this is still just 1% of the data and the same plan will be used with the same quick results. If you use the numbers, however, 20070103-20061228=8875. Thats around 50% of the data – this means a full table scan. You will see lots of managers wondering why the report became so slow “all of the sudden” and “just when we need the end of the year report very urgently”.
I hope I made my point here: Using correct data types is critical, more so in Oracle, and it is so easy you really have no reason to get it wrong.