First International NoCoug SQL Challenge – And the Winner is…

Iggy Fernandz invented an impossible challenge.
NoCoug and Apress sponsored an international competition.
9 SQL experts from all over the world submitted clever solutions.
SQL Guru Dan Tow reviewed the solutions and picked his favorite.

Alberto Dell’Era wins the challenge for his wonderful solution using Discrete Fourier Transforms; the runner-up is André Araujo from Australia, who used binary arithmetic and common table expressions in his solution.

The Order of the Wooden Pretzel will be bestowed on Alberto but the real prize is six books of his choice from the Apress catalog. André will receive a prize of six e-books of his choice.

Congratulations to the winners! Iggy will contact you soon to arrange for the delivery of your bounty. Thanks to everyone who participated. We’ll see you again in April for the second SQL challenge. BTW. If you have good SQL riddles – something fun and difficult, send them my way. Who knows, maybe you’ll be featured as the next challenging wizard.

Which reminds me: Last April, when Iggy presented his challenge, he told me that he wrote his own highly efficient solution to the problem. Now that the challenge is over and the winners have been announced – it is high time he will present his own solution to the world. Please go to his blog and bug him a bit about this. I’m sure a little peer pressure will make him see the light. Thanks 🙂

NoCoug will publish Iggy’s review of the winning result in the NoCoug Journal. If it’ll be posted online I’ll link to it. Iggy’s article  includes an explanation of the decision by Dan Tow and also an explanation of the mathematics behind Alberto’s solution that I actually almost kind-of understood after reading both Alberto’s and Iggy’s explanations for around 4 hours one afternoon. My university would take my degree back if they knew, I’m sure.

One thing that bugs me is that Alberto’s winning solution is not very interesting from a SQL perspective. Note that the winning solution is the DFT solution which uses cartesian joins and subquery factoring, not the FFT solution which uses SQL Model. Dan Tow said that Model is iterative and therefore violates the spirit of SQL – which is about sets and relations. I can understand his sentiment, except that Model is part of the SQL standard.

So the winning solution is just plain SQL. It is algorithmically brilliant and I completely admire Alberto for coming up with it, but still just plain (very long) SQL. And I was wondering – this is a SQL challenge and not a math challenge. Shouldn’t the winning solution demonstrate more SQL brilliance and less math brilliance?

On the other hand, the winning solution demonstrates one of the most important principles of programming – Using a clever algorithm (such as FFT) will give you performance and scalability that no amount of query and database tuning will ever achieve. So in away its light use of SQL features does make it a good win for a SQL challenge.

My personal favorite solution is Fabien (Walder) Contaminard‘s. He still uses more math than SQL. He uses the multinomial distribution (which unlike’s Alberto’s math I actually understood). I thought it was a simple, straight forward, readable and very clever solution. Of course, it was slower than the winning entry, but he did find some interesting differences between SQL Server and Oracle while trying to optimize it.
Here’s a summary of all submitted solutions:

  1. Laurent Schneider’s solution using CONNECT BY and XMLQUERY (Switzerland).
  2. Rob van Wijk’s solution using the MODEL clause (Netherlands).
  3. Vadim Tropashko’s solution using 11gR2 recursive joins This entry should win the shortest solution award. (USA).
  4. Craig Martin’s solution using CONNECT BY and logarithms (USA).
  5. Alberto Dell’Era’s two solutions using Fourier transforms (Italy).
  6. Fabien (Waldar) Contaminard’s solution using the multinomial distribution (France).
  7. Almost anonymous Postgres solution using pipeline functions (Romania).
  8. André Araujo’s solution using binary arithmetics and subquery factoring (Australia)

Thanks for playing. See you all next Year. Don’t forget to send me your SQL riddles so I can pick the best for the next challenge (and spend long weekends trying to solve them all!)


Got Lost Following NoCoug SQL Challenge?

So did I.

For the last few weeks there were links flying everywhere. Solutions, people commenting on solutions, people commenting on comments, and so on. I got completely lost on who said what to whom. And we are only a month and a half into the competition!

For our collective convenience, I present the definitive list of who said what to whom regarding the NoCoug SQL Challenge. Bravely stolen from Iggy Fernandez by yours truly.

Breaking news:

  1. More prizes! You remember that  the winner will receive six books by his choice from Apress catalog. Apress just donated six more eBooks for the runner ups. I guess they are also overwhelmed by all the great solutions that were sent.
  2. The end date for the competition was not published, but rumors are that it’ll probably happen at the end of June. So if you have a brilliant solution you did not send your results in yet (SQLchallenge@nocoug.org) , it is time to do so.
  3. Remember the “SQL vs. PL/SQL” debate? (If you don’t there are links below). Iggy asked a bunch of oracle gurus their opinion on the topic and publish an article about it on NoCoug’s Journal (I think we are the only Oracle User Group with a quarterly journal).  NoCoug members will get a copy in the mail, as usual. Non-Members can either register really fast or ask their friends if they can borrow a copy 🙂

Contest announcements:

  1. Official announcement: http://www.nocoug.org/SQLchallenge/FirstSQLchallenge.pdf
  2. Write-up on Chen Shapira’s blog: https://prodlife.wordpress.com/2009/03/31/first-international-nocoug-sql-challenge/

Oracle solutions:

  1. Rob van Wijk’s solution using the MODEL clause (Netherlands): http://rwijk.blogspot.com/2009/03/calculating-probabilities-with-n-throws.html
  2. Vadim Tropashko’s solution using Common Table Expressions (USA): http://vadimtropashko.wordpress.com/2009/03/25/variable-number-of-joins/
  3. Laurent Schneider’s solution using CONNECT BY and XMLQUERY (Switzerland): http://www.amazon.com/gp/blog/post/PLNKI2MYB0YCYAUL/#Mx1E6DR4VGI3O9Z
  4. Craig Martin’s solution using CONNECT BY and logarithms (USA): http://www.amazon.com/gp/blog/post/PLNKI2MYB0YCYAUL/#Mx26NK96OBG2FLQ
  5. Alberto Dell’Era’s two solutions using Fourier transforms (Italy): http://www.adellera.it/investigations/nocoug_challenge/index.html
  6. Fabien (Waldar) Contaminard’s solution using the multinomial distribution (France): http://www.waldar.org/blog/200904/nocougs-first-sql-challenge

Non-Oracle solutions:

  1. Postgres solution (Romania): http://hype-free.blogspot.com/2009/04/nocoug-sql-challenge.html

Commentaries:

  1. Comment on Alberto Dell’Era’s solution by Jonathan Lewis: http://jonathanlewis.wordpress.com/2009/04/15/model/
  2. Analysis of Alberto Dell’Era’s solution by Iggy Fernandez: https://prodlife.wordpress.com/2009/03/31/first-international-nocoug-sql-challenge/#comment-2772

Debate on the merits of SQL and PL/SQL

  1. Rob van Wijk: http://rwijk.blogspot.com/2009/03/choosing-between-sql-and-plsql.html
  2. Chen Shapira: https://prodlife.wordpress.com/2009/03/20/would-you-rather-maintain-sql-or-plsql/
  3. Laurent Schneider: http://laurentschneider.com/wordpress/2009/03/to-sql-or-to-plsql.html
  4. H.Tonguç Yýlmaz: http://tonguc.wordpress.com/2009/03/21/hot-discussion-sql-or-plsql/

Whew, quite a link collection. Now I’m obsessing whether or not I missed anyone!
If I did, let me know if the comments. I’ll keep editing this post with the latest info.


First International NoCoug SQL Challenge

Spring is in the air. Days are longer, sun is shining, flowers blooming and a new SQL Challenge arrives to our community.

Its a nice coincidence. Exactly one year ago, Eddie Awad and I closed the Obfuscated SQL contest, and put the entries up to final vote. I posted an analysis of the entries and Laurent posted a funky photo of me. Eddie and I talked of running another Obfuscated SQL contest this year, but I have to admit that I forgot all about it.

But spring is not really spring without a juicy SQL challenge to whack your brains! So I think the SQL community is lucky that Iggy Fernandez talked both NoCoug and Apress into sponsoring a new SQL challenge.

The challenge is open to everyone. Not just NoCoug members, not just Oracle community. Anyone who can SQL, can enter. Did I mention there are prizes? Apress will give 6 books to the winner, by his choice.

Oh, and Dan Tow is one of the judges! He’s a great SQL expert, so I expect interesting commentary about the problem and the various solutions.

You can read all the rules in the NoCoug website. But the challenge itself is quite simple:

Consider a twenty sided die. Each side of the die has a different number on it. For each side, there is a certain probability that the die will fall on that side. Iggy collected all the probabilities in a table. Now he wants to know what is the probability that he will get certain sums when the die is thrown N times is succession.

Go ahead. Try to solve it. Not as easy as it looks? When Iggy first sent me the problem description, I thought it was impossible. I sent it to few friends, just to annoy them. And turned out that it is far from impossible. In the process, the challenge description was leaked, and we have few terrific solutions already: Rob Van-Wijk used Model, Laurent used connect-by and Vadim used 11gR2!

Iggy did not publish his own solution (and he can’t participate anyway since he is one of the judges), but to tease me a bit, he sent me a graph with the time it took his code to run for different numbers of dice throws. He has O(N^2) complexity, not exponential! From what I’ve seen not one of the published solutions is even close!

probabilities

I hope for a lot of participants. I hope someone finds a solution that is at least as fast as Iggy’s, because he’s been gloating about it for two weeks now. I also hoping for lots of non-Oracle participants, because I’m curious how different their solutions will be, or maybe SQL is just SQL and all the solutions will look the same.


Would You Rather Maintain SQL or PL/SQL?

Rob Van Wijk wrote about the reputation that SQL has as unmaintainable. He claims that SQL is not inherently less maintainable, but rather that developers who say that SQL is difficult to maintain are simply not as skilled in SQL as they are in PL/SQL, and they certainly don’t document their SQL to the same level they document their PL/SQL.

I mostly agree with Rob. It is not a coincidence that developers are more comfortable with PL/SQL than with SQL. PL/SQL is a procedural language. Nearly everyone learned programming with a procedural language, and probably programmed in a procedural language for at least three years before being introduced to SQL.

SQL is a different beast. It is a data maniplation language based (roughly) on the relational model, and most of its operators are based in relational algebra. Quite different from Pascal, C or Java. Its not really surprising that most programmers are never entirely comfortable around SQL and keep being surprised by its behavior.

I think there are few other reasons that SQL is percieved as less maintainable though:

  1. Readability of SQL is heavily dependent on the underlying data model. A good model will lead to more readable and maintainable SQL. When the data model does not match the reports that need to be generated, PL/SQL may have an advantage, its readability being less impacted by the data model.
  2. Differences in how SQL statements are written can lead to differences in performance. It is somewhat true in PL/SQL as well, but I think SQL is more sensitive. Performance often takes priority over readable code. Not using subquery factoring because it results in sub-optimal optimizer decision is one consequence. Of course, the reasons for writing SQL in a specific way are very rarely documented, making the code even less maintainable.
  3. As a result of #2 – developers who will gladly refactor PL/SQL code to make it more readable, will think at least twice before refactoring SQL. It is nearly trivial to predict how a given code change will impact PL/SQL performance, but nearly impossible to do the same for SQL (unless you are Jonathan Lewis).
  4. Debugging is a big issue with SQL. In PL/SQL it is easy to add few put_line messages to track what exactly your code is doing and why. There are even nice commercial debuggers. Finding bugs in SQL is still a bit of a black art that must be done without any tools or methods.

I’m not sure if there are any guidelines on when to prefer SQL and when PL/SQL is better, but at least the discussion is a bit more complete.


Pivot – You are doing it wrong

Suppose you have a table like this:

drop table t_;

create table t_ (
  nm Varchar2(20),
  pr Char    ( 7),
  vl Number  
);

insert into t_ values ('company 1','2003-06', 10);
insert into t_ values ('company 1','2003-07', 29);
insert into t_ values ('company 1','2003-08', 39);
insert into t_ values ('company 1','2003-09', 41);
insert into t_ values ('company 1','2003-10', 22);

insert into t_ values ('company 2','2003-06', 13);
insert into t_ values ('company 2','2003-07', 17);
insert into t_ values ('company 2','2003-08', 61);
insert into t_ values ('company 2','2003-09', 55);
insert into t_ values ('company 2','2003-10', 71);

insert into t_ values ('company 3','2003-06', 33);
insert into t_ values ('company 3','2003-07', 18);
insert into t_ values ('company 3','2003-08', 27);
insert into t_ values ('company 3','2003-09',  5);
insert into t_ values ('company 3','2003-10', 32);

(Thanks to René Nyffenegger for the helpful create script)

And you want this output:

NM                          JUL        AUG        SEP
-------------------- ---------- ---------- ----------
company 2                    17         61         55
company 3                    18         27          5
company 1                    29         39         41

If you are using 10g or older versions, you can use Tom Kyte’s decode trick. In 11g, we have an official keyword to solve this problem.

What you should never, ever do, is this:

  select
    t1.nm,
    t1.vl jul,
    t2.vl aug,
    t3.vl sep
  from
    t_ t1,
    t_ t2,
    t_ t3
  where 1=1
  and t1.nm=t2.nm
  and t2.nm=t3.nm
  and t1.pr='2003-07'
  and t2.pr='2003-08'
  and t3.pr='2003-09'
  group by t1.nm, 
    t1.vl,
    t2.vl,
    t3.vl

And if you do this, and if t_ has more than 15 rows, and if you actually need more than 3 columns, don’t be surprised if the performance may be slightly disappointing.


Splitting Comma Seperated String to Rows

Got the following email from one of my users:

I am trying to write a SQL select to get comma delimited values from one field in rows.

For example I have a Table T1
create table t1(a  varchar2(30),
b  varchar2(30));

insert into t1 values ('A','27.68%,2.78%,69.55%');

SELECT A, B FROM T1;
A       B
=====  =====
A  27.68%,2.78%,69.55%

Can you help me write a SQL which will give the following output?

A        PERCENT
======  ========
A   27.68%
A    2.78%
A   69.55%

My solution is rather ugly, but it works (on 10g and above) and makes no assumptions about the number of elements in the list :

With t2 as (select A,','||B||',' as B from t1)
SELECT A,SUBSTR(B,INSTR(B,',',1,LEVEL)+1,INSTR(B,',',1,LEVEL+1)-INSTR(B,',',1,LEVEL)-1)
FROM t2
CONNECT BY LEVEL <= length(regexp_replace('[^\,]',''))-2 ;

Senior DBA had a simpler solution: “Just replace all the commas with new lines”.


Obfuscated SQL

I’m a closet C programmer, and an avid follower of the Obfuscated C Contest . SQL is an interesting and diverse language. No reason why database professionals can’t have their own obfuscated code.

So I shared my idea. And the renowned community organizer Eddie Awad made it an official contest! And the wonderful SQL gurus all contributed the scariest queries you’ve ever seen, and now it is time to vote!.

Rob van Wijk deserves a special reward for having the first entry – less than a day after the contest was announced! And as Paul Gallagher noted, it is pretty enough to be on a t-shirt. I think that Rob’s entry set the tone for the rest of the competition – XML, regexps, chr, ascii, connect by and model became prerequisites for any self respecting  obfuscated query.  Rob also included  use of pivot,multiset, messages in the code (‘I LOVE SQL’) and silly math tricks that I enjoyed much (where ln(34e5)=exp(ascii(‘@’)-2e2) is so much nicer than where 1=0 ). I was a bit disappointed that I managed to decipher most of the query and its tricks in an hour or two. Next year we expect worse 🙂

Shoblock thought that OTN is full of obfuscated queries and stepped up with an example. I beg to differ. Not any plain old horrible query deserves to be an obfuscated entry. A true obfuscated query must be horrible and cleverly impressive at the same time. It should make you say “wow” as well as “yuck” and hopefully help you learn something other than how not to write code while you read it.

Laurent Schneider raised the bets by submitting a query that runs on 10.1.0.5 only and contains no white spaces. It deserves a reward for most tricks with fewest characters and also for scariest query. I would not want to meet this monster in my source control on a dark night. I cheated a bit and got an in-depth tour of the query from the master himself. I hope Laurent will blog his explanation, but if he won’t I’ll do it. For now, I’ll point out the use of double connect by, dburitype and stats_mode that make the query unique. I’ll also mention that the obscure version was chosen not just to annoy me. 10.1 is the only version where one can use the model clause and still have group by sort the results of the query.

Frank Zhou gets the award for the most useful query. While the other entries only print nice messages, Zhou’s entry solves four different problems! As far as pure obfuscation goes, I think this entry is less impressive than others. Of course, the good and clear formating also helped in making this just too readable to be obfuscated.

Volder’s entry arrived just in time before the deadline, and it is both pretty and clever.  I think it deserves the overall winner award. The query contains all the required tricks and then adds new ones. It uses comments for prettier formatting, it uses XQuery, it uses subquery factoring and nested groups in regular expression. I’m still working on understanding all of it, but I’m sure in few hours I’ll get it and be a better person for it (if the headache won’t get me first). Volder, any chance you’ll be finishing the layout so we can see the turtle? The formatting is still great even without the turtle. The use of lots of horizontal spaces to separate the sun from the tree makes it very difficult to real.

I just remembered that no prize was mentioned. Any suggestions? A bottle of Stoli Crystal maybe?


Debugging SQL

I love SQL. Let me count the ways:

  1. It really fits the relational database model. You only have  to work with Hibernate for few days to appreciate the beautiful simplicity of SQL.
  2. SQL doesn’t have direct memory access. Which means that I don’t have all those freaky  bugs that C developers have, where you write random chunks of memory by mistake and get totally random results that are impossible to debug.
  3. Even more important – the concurrency model is relatively simple and as a bonus the DB will detect and resolve deadlocks for you.
  4. It is easy to read. There are no “obfuscated SQL” contests for a reason.
  5. Eye on Oracle got few more reasons, just in case you need convincing.

However, one of the things I really wish I had is a nice SQL debugger, because I’m tired of picking apart complicated SQL statements by hand just to find the bad join that causes duplicate rows or missing rows or whatever.

I realize that debugging SQL is not as trivial as it would be for C, because an SQL statement does not contain a list of well defined operations that one can step though one by one while watching key variables.  But there are a bunch of logical set operations that make up each query. Even if Oracle’s implementation of them can get complicated – an “exists in” is logically a nested loop, join is a set product which can then get filtered, etc.

So, why not build a tool that allows you to walk through this operation. The tool should allow you to see the data set after each operation. This way it should be much easier to find the join that breaks everything. After all, we go through a very similar process every time we try to debug a query. It is clear that we need a tool that will support the task that we do so many times a day.


Rounding dates

We are shutting down most of the SQL Servers, moving our applications and customers to Oracle. This move has been going on for a while and is targeted to finish on the end of the month. One of the applications that will have to move is our SLA calculations. Thats an important and highly visible app – our executives are following the results, so we really don’t want to mess up with this one. The downside is that moving it means translating a bunch of code written in T-SQL for SQL Server to Oracle’s PL/SQL. To make things even better, the application involves a lot of date manipulations. SQL Server and Oracle have very different approaches to dates. So, this task has been tossed back and forth for a while now, because no one really wanted to get his hands dirty with this, until it firmly lended on my desk, with 2 days deadline.

So, one of the things that had to be translated was a query that groups the last day of data into 15 minute intervals. I spent few minutes trying to translate the code into Oracle in a operation by operation basis, before I gave up and decided to write it from scratch. Since my solution is both generic (will work on any minute interval, not just 15) and much cuter than the other solutions I found in various forums, I decided to post it here:

Round to previous 15 minute (i.e. 4:10 will become 4:00):
TRUNC(sysdate,'HH24') + (to_char(SYSDATE,'mi')-mod(TO_char(SYSDATE,'mi'),15))/(24*60)

Round to next 15 minutes (4:10 will become 4:15):
trunc(sysdate,'HH24') + (to_char(SYSDATE,'mi')+15-mod(TO_char(SYSDATE,'mi'),15))/(24*60)