Suppose your boss asks you how long will it take to run a certain query, or update, or maybe an export.
You run the query (or update, or export) and it takes 5 minutes to complete. You can tell you boss “The query takes 5 minutes to complete”. But of course, you can’t be sure it will take 5 minutes every time. What if your boss runs it and it takes 6 minutes and 10 seconds? We all know that it is possible. For example, I may hack into your server and lock the table for one minute and 10 seconds just to make sure
A more experienced DBA (or developer) may run the query 10 times, average the results and tell the boss “I ran the query 10 times, and it took on average 5 minutes to complete”.
But this won’t tell the whole story. Maybe you should say “I ran the test 10 times. The first run took 8 minutes, because most of the data wasn’t in the cache. The rest of the runs took 4 minutes and 40 seconds on average.”
Or maybe even “I ran the test 10 times. It took between 3 to 8 minutes”. Maybe you even want to send him a small histogram of your results.
If you studied statistics, you can do even better:
First, you want to agree with your boss on an acceptable error. For example, if the average response times in your test end up to be 5 minutes, you can agree that any response time between 4 and 6 minutes is acceptable. This gives you 20% margine of error.
You also want to agree on a confidence level. Confidence level says how sure you are that your results are meaningful and not the product of random luck. You know, you can throw a perfectly balanced coin 10 times and get heads every single time. Obviously, the more you run the test, the more confident you can be that your results are valid.
Then, using your error margin and confidence intervals, you can decide how many times you want to repeat the test. Actually, you’ll also need the to know the variance between your tests. If you run the test 5 times and get wildly varying response times, you’ll want to run it many many times before you are sure of your averages. If you run it 5 times and get the same result – you are either doing something very wrong, or you can just send a very reliable answer to your boss.
So, run the query few times, estimate the standard deviation (a process known as bootstrapping) and then use R to give us our sample size:
n <- (qnorm(confidence level)*stddev)/(error margin))^2 (for example: (qnorm(0.99)*57.2)/(60))^2)
(qnorm is how R translates our confidence levels onto the normal curve).
If you think statisticians are anal, its only because you never worked with a real performance professional. The real professional is going to demand all of the above, and also grill you about the following:
- Are you performing the tests sequentially or in parallel? How many sessions? Why?
- Are there other load in the system while you are running the test? If there is, is this your typical load?
- What if someone else modifies the same data in parallel?
- Do you need to know how the query will perform during peak hours? If so, we need to test durin peak hours.
- Planning to clean the cache between runs? why or why not?
- Does your system need a ramp-up period before performance stabilizes?
- Do you have other differences between the test system and production that can impact the results?
- Are you sure you only want to look into the averages? After all, the outliers, 90th percentile and variance values can also indicate problems and future user complaints.
- What about a nice histogram and 3 graphs to justify my consulting fees?
I’m not saying you should do all that whenever you need a quick estimate. Its just that sometimes you want to be a little more accurate – and you should know how to do it.