I'm just a simple DBA on a complex production system

Writing about all things production. Especially Oracle databases.

Automatic Maintenance Tasks August 27, 2009

Filed under: 11g, tips — prodlife @ 1:48 am

Automatic Maintenance Tasks is a new 11g feature which I recently noticed. Its a bit embarrassing, since I’ve had 11g in production for nearly a year and apparently I’ve been using the feature all along.

I discovered the feature when I noticed that the automatic statistics gathering job is running several times on a weekend, instead of just once as it did in 10g. Then I discovered that the job has a very strange name starting with ORA$, and that the name changes every time the job runs.

Turns out that Oracle’s automatic jobs are not longer jobs. They are now Maintenance Tasks.

Here’s how Oracle defines the tasks:
“When a maintenance window opens, Oracle Database creates an Oracle Scheduler job for each maintenance task that is scheduled to run in that window. Each job is assigned a job name that is generated at runtime. All automated maintenance task job names begin with ORA$AT. For example, the job for the Automatic Segment Advisor might be called ORA$AT_SA_SPC_SY_26. When an automated maintenance task job finishes, it is deleted from the Oracle Scheduler job system. However, the job can still be found in the Scheduler job history.”

And here’s the reason my statistics ran several times on a weekend:
“In the case of a very long maintenance window, all automated maintenance tasks except Automatic SQL Tuning Advisor are restarted every four hours. This feature ensures that maintenance tasks are run regularly, regardless of window size”

What practically changed? Almost nothing, we had schedule windows in 10g, and the maintenance jobs (not tasks) ran within the defined windows. I’ve no clue why this change was necessary.

It definitely looks like infrastructure prepared for a future cool feature. At present, it just looks weird. For instance:

  • You can’t add tasks. Oracle has 3 predefined tasks – statistics, space advisor and tuning advisor. You can add or remove maintenance windows and define in which window to run each task, but you can’t add your own task.
  • There are lots of seemingly unnecessary definitions around. For example, from the dictionary tables, you can see there are task clients and task jobs. Currently it looks like they are the same thing, since there is a one-to-one relation between them, but it probably won’t stay this way.
  • The documentation doesn’t document much. There are fields such as client attributes with values that are not really explained anywhere.
  • The API is really weak. As I said, you can’t do much beyond enable/disable tasks in specific schedules

So far, it looks like this feature adds confusion but no value. I hope Oracle will do something fun with it in the future.

 

I Love NoCoug Training Days August 25, 2009

Filed under: advert, training — prodlife @ 2:08 am

Sometimes, life kind of loops on itself. A circle closes. You find yourself at the same spot you were two years ago, but from a completely different viewpoint.

Flashback two years and few month back: I’ve recently relocated to the US. I’m somewhat of an Oracle newbie, but I know that there are all kinds of cool DBA stuff going on, and I desperately want to be part of it. My colleagues tell me that HotSoS seminars are the best, so I asked my boss to send me to one. The request was left hanging in the air for weeks, just to be rejected. Because it was far away and I had to fly there and it was just too expensive. I was in tears.

And then I got this email from the local user group – NoCoug. They said they are doing a training day with Kyle Hailey. Kyle Hailey of Oak-Table fame was my hero at the time. I just finished reading Oracle Insights, and I was deeply impressed by his story of the program that could connect to the SGA directly through shared memory. I was all “Wow! Kyle Hailey! Only an hour drive away! And it costs just 250$! My boss will have to approve it! Hell, I’ll even pay for it myself!”.

And my boss did approve it. I went to the training day, and it was amazing. I learned more at that day than at the week long classes I took when I learned to be a DBA. What I learned then is still useful to me, almost every day on the job.

Obviously, I was deeply thankful to NoCoug for making it possible for me to attend this amazing event for a price my boss agreed to pay.

Back to present day: I am the training day coordinator for NoCoug. I want to create the same experience for every other DBA in our region. Top-notch training event at a price that won’t make your boss blink.

Everyone who knows me will laugh at the idea of me coordinating a training day. I usually can’t coordinate my own breakfast. But this is so important to me – every DBA should be able to be better at his job by learning from the best experts.

I also knew just who should lead my first training event – Tanel Poder is one of the best experts I know (Probably at the top 3 of my personal ranking), his blog and scripts and systematic troubleshooting ideas completely changed the way DBAs work. In a very good way.

And he agreed to give his famous “Advanced Oracle Troubleshooting” at North California, and we agreed on dates and prices, and I found a location. The impossible happened and I almost coordinated a training day.

Now I just need people to register so the event can really happen. I desperately want everyone to know about this event. I know it can improve the way people work so much its really a shame if someone will miss the opportunity. So even though I’m just a simple DBA and not a marketing expert, I’m going to do my best and annoy the hell out of everyone just to make sure that every single DBA in North California will know about this event. I even put a small funny looking ad in my blog.

If you can help me here by spreading the word to your NorCal friends – I’ll really appreciate the help.

P.S:
Advice on how to do non-annoying marketing for the event will also be appreciated. I know some of you have been promoting your own events for years. Please share your experience!

 

Preparing to Clone! August 17, 2009

Filed under: scripts — prodlife @ 10:54 pm

We are moving to this new data center. The new servers are in the new data centers, just waiting for Oracle to be installed on them. We have about 50 new servers to install.

Obviously we want to install them as quickly as possible, and taking as little human-work time as possible. To achieve this, we are checking to options – scripted silent installation and installation cloning.

I’m checking the installation cloning part, and I’m using Oracle’s Universal Installer User Guide for the process. One of the first steps in the process is:

At the source, you run a script called prepare_clone.pl.
This is a Perl script that prepares the source for cloning by recording the information that is required for cloning. This is generally located in the following location: $ORACLE_HOME/clone/bin/prepare_clone.pl.

When I tried to run the script, I found out that oracle user did not have execute permissions on the file. Then I found out that the script had the location of PERL hardcoded to the wrong location. Finally, I found out that the file had the usual amount of comments for an Oracle script, but only one line of code:
exit 0;

I found this incredibly amusing, so I decided to blog on this. While blogging, I took a closer look at the documentation, and found the following comment:

The need to perform the preparation phase depends on the Oracle product that you are installing. This script needs to be executed only for the Application Server Cloning. Database and CRS Oracle home Cloning does not need this.

I guess the joke is on me. Serves me right for not reading the instructions carefully.

 

You Can’t DeDupe Oracle DB Files August 14, 2009

Filed under: Storage — prodlife @ 1:57 am

One of our storage vendors has DeDupe technology. DeDupe is short for deduplication – the idea is that when you have identical blocks on the storage, it will only keep one copy of the block. This is a nice idea that saves on storage. Its especially good on shared filesystems where many users keep copies of identical files.

Our vendor loves DeDupe and managed to convince my storage manager that DeDupe will lead to amazing storage savings. Even on the DB volumes! No matter how much the DBAs protested that DBs rarely have full blocks that are identical to each other, the vendor kept insisting that many other customers have seen amazing storage savings using this technology on their data files. “Databases have many empty blocks”, the storage manager said after lunch with the vendor “And they are all identical! Think how much space you can save by keeping just one empty block!”.

We agreed to test DeDupe. As expected, we saw about  2% of space savings. Not exactly what the storage manager expected.

I wasn’t surprised. Even empty data blocks in Oracle DB files are not really identical. They have a header, which contains a relative address, which makes each empty block slightly different.

So, no DeDupe. Thought you may want to know, so you won’t have to repeat this experience. Maybe even send a link to your vendor :)

If your experience was different though, I’d love to know. The vendor insisted that he had many custormers happily deduping their databases.

 

Testing. Again. And Again. August 13, 2009

Filed under: musing, nerdism — prodlife @ 2:04 am

Suppose your boss asks you how long will it take to run a certain query, or update, or maybe an export.

You run the query (or update, or export) and it takes 5 minutes to complete. You can tell you boss “The query takes 5 minutes to complete”. But of course, you can’t be sure it will take 5 minutes every time. What if your boss runs it and it takes 6 minutes and 10 seconds? We all know that it is possible. For example, I may hack into your server and lock the table for one minute and 10 seconds just to make sure :)

A more experienced DBA (or developer) may run the query 10 times, average the results and tell the boss “I ran the query 10 times, and it took on average 5 minutes to complete”.

But this won’t tell the whole story. Maybe you should say “I ran the test 10 times. The first run took 8 minutes, because most of the data wasn’t in the cache. The rest of the runs took 4 minutes and 40 seconds on average.”

Or maybe even “I ran the test 10 times. It took between 3 to 8 minutes”. Maybe you even want to send him a small histogram of your results.

If you studied statistics, you can do even better:

First, you want to agree with your boss on an acceptable error. For example, if the average response times in your test end up to be 5 minutes, you can agree that any response time between 4 and 6 minutes is acceptable. This gives you 20% margine of error.

You also want to agree on a confidence level. Confidence level says how sure you are that your results are meaningful and not the product of random luck. You know, you can throw a perfectly balanced coin 10 times and get heads every single time. Obviously, the more you run the test, the more confident you can be that your results are valid.

Then, using your error margin and confidence intervals, you can decide how many times you want to repeat the test. Actually, you’ll also need the to know the variance between your tests. If you run the test 5 times and get wildly varying response times, you’ll want to run it many many times before you are sure of your averages. If you run it 5 times and get the same result – you are either doing something very wrong, or you can just send a very reliable answer to your boss.

So, run the query few times, estimate the standard deviation (a process known as bootstrapping) and then use R to give us our sample size:

n <- (qnorm(confidence level)*stddev)/(error margin))^2
(for example: (qnorm(0.99)*57.2)/(60))^2)

(qnorm is how R translates our confidence levels onto the normal curve).

If you think statisticians are anal, its only because you never worked with a real performance professional. The real professional is going to demand all of the above, and also grill you about the following:

  • Are you performing the tests sequentially or in parallel? How many sessions? Why?
  • Are there other load in the system while you are running the test? If there is, is this your typical load?
  • What if someone else modifies the same data in parallel?
  • Do you need to know how the query will perform during peak hours? If so, we need to test durin peak hours.
  • Planning to clean the cache between runs? why or why not?
  • Does your system need a ramp-up period before performance stabilizes?
  • Do you have other differences between the test system and production that can impact the results?
  • Are you sure you only want to look into the averages? After all, the outliers, 90th percentile and variance values can also indicate problems and future user complaints.
  • What about a nice histogram and 3 graphs to justify my consulting fees?

I’m not saying you should do all that whenever you need a quick estimate. Its just that sometimes you want to be a little more accurate – and you should know how to do it.

 

Concurrent Joins – Coming Soon to a DW near you? August 5, 2009

Filed under: hardcore, musing, nerdism, optimizer — prodlife @ 2:27 am

Oracle’s Concepts book says the following about the characteristics of Data Warehouses:

“A typical data warehouse query scans thousands or millions of rows.For example, “Find the total sales for all customers last month.”"

They also say:

“Data warehouses are designed to accommodate ad hoc queries. You might not know the workload of your data warehouse in advance, so a data warehouse should be optimized to perform well for a wide variety of possible query operations.”

So we have multiple ad-hoc queries, each with its own plan, unaware of other similar queries that may be running at the same time, each reading millions of rows. Obviously they compete for resources. What if several concurrent queries are full-scanning the same table? What if this table is too large to fit entirely into memory?

It sounds obvious that in this case, all the separate execution threads will read their own blocks into memory, often “aging out” blocks that will be needed in just few seconds by another query. All competing for buffer-cache latches and blocks. The more queries we have concurrently on the system, the slower response times will be, due to competition on limited memory and IO resources.

Jonathan Lewis famously said: “The performance of a query should be related to the size of the data set you’re interested in, not to the size of the database”

I would like to add “… or to the number of concurrent users in the system”, but this is obviously untrue. Resource contention when the number of users rises has dramatic negative impact on performance. This is why we run load tests before putting new systems in production. We know that good performance with use in the lab does not guarantee good performance with 40 users in production.

But what if it doesn’t have to be this way? What if your database could, instead of optimizing each query seperately, could optimize the entire workload? So that with one or 40 or 256 users in the system we will still see very similar response times? What if the queries could share resources instead of competing for them?

All this is a rather lenghthy introduction to a cool idea I’ve ran into when scanning the program of the upcoming VLDB conference.

The paper I’ve read is called “A Scalable, Predictable Join Operator for Highly Concurrent Data Warehouses” and it is by George Candea, Neoklis Polyzotis and Radek Vingralek.

In the paper, the authors introduce CJOIN – a physical operator (i.e. an implementation of a relational operator) that can evaluate concurrent joins efficiently. It was written to allow sharing of CPU and IO resources and to fit modern DW systems – star schema, multiple cores, fast sequential scans and large memory.

The idea behind the CJOIN design is that there is a single physical plan that is “always on” and is optimized based on run-time statistics. Each new query can use this plan at any time and start sharing work with concurrent queries in the same plan.

Since all this sounds a bit vague, let me present the example that is given in the paper to demonstrate the idea. Then I’ll mention few of the more interesting points that are detailed in the paper, and hopefully after reading my descriptions you’ll decide to read the paper (which requires some effort on the part of the reader).

The design on CJOIN is based on a simple observation: Star queries all work by filtering a fact table through dimension tables and aggregating the results.

CJOIN works as a pipeline – receiving the input from a continuous scan of the fact table, passing the data through a set of filters, one for each dimension table, and distributing the results to aggregation operators that produce the output for *all the queries* using the operator.

Since the scan of the fact table is continuous, a query can start using the operator at any time, by remembering the point it registered and completing when the scan reaches this point again.

Suppose we have a fact table “Sales” with dimension tables “Customers” and “Products”.
Lets imagine the following two queries running concurrently:

Select sum(quantity) from sales, customers, products
where sales.customer_id=customer.customer_id and sales.product_id=products.product_id
and customers.city=’Japan’ and products.type=’High-Price’;

Select avg(dollar) from sales, customers, products
where sales.customer_id=customer.customer_id and sales.product_id=products.product_id
and customers.service_level=’Gold’ and products.type=’High-Price’;

As you can see, they share the same data source, but apply different filters and predicates.

Here’s how the CJOIN pipe will work:

The pipeline starts with a pre-processor, which receives rows from the continuous scan of the fact table and forwards them to the filtering part of the pipeline. Before doing so, the pre-processor adds few bits to each row – one bit for every query that is registered on the pipeline (i.e. queries that are interested in rows from this fact table). All the bits start out as “1″, signifying that at this stage all queries are interested in every row.

Now lets take a look at the filters:
We have a filter for each dimension table. The filter is a hash table that stores all the rows of that dimension that are of interest to any of our queries. Remember that while the fact table is too big to fit into memory, dimension tables are typically small enough to fit the memory of a nice DW server. Like the fact rows, the filter rows also have an additional byte per query.

So in our example, the “customers” filter will contain all the customers from Japan and customers with service_level “Gold”. The rows for customer from Japan will be have the first bit turned on and the second turned off, the row for Gold customers will have the reverse, because only the first query checks for customers from Japan and only the second checks for Gold customers. Products filter will contain the products of type “High Price” and both bits will be on, as both queries check for High Price products.

Note that when we start a new query, we need to add the appropriate rows from the dimension tables to the filters and remove them when the query is finished running. This is relatively quick because dimension tables are relatively small.

Now a row from the fact table arrives at the Customers filter. We will quickly check the customer_id on this row and see if it matches any row in the filter. If it exists, we know that at least one query wants this fact row. We can then check the query bits in the matching filter row to see which query needs it. If we see that only query 1 needs this fact row, then this row no longer interests query 2 and we can mark the second bit of the fact row as 0. If all query bits are marked as 0 , we can throw the fact row away. No one will need it.

In this way the row from the fact table passes through all the filters and arrives at the distributor. The distributor recieves fact rows that are relevant for at least one query in the current work load. It checks the bits to see which queries are interested in this row and sends it to the aggregators for these queries.

Once you got this example, you should be able to enjoy the paper. The paper actually contains this example, but with D1 instead of customers and d21 instead of Gold. I’m just a simple DBA and I understand better with a more concrete example.
You probably want to read the paper because it contains the algorithms for adding and removing queries from the pipe, so you’ll be convinced of how fast and clever this is.

The paper also contains a discussion of how to best parallelize this pipeline. Parallelization is very important in DW, and the paper offers several ideas and picks the best. It also has some ideas on how to handle concurrent updates to the tables, and ideas of how to adapt the CJOIN to other models except star schema.

Finally, the authors of the paper implemented their idea on top of PostgreSQL database, and they have extensive analysis of how the CJOIN indeed improve performance for concurrent workload (They seem to achieve almost linear growth in throughput as the number of concurrent queries grow!).

I hope you enjoyed this peek into the future of DBMS as much as I did and I hope to see CJOIN in Oracle soon :)