First International NoCoug SQL Challenge

Spring is in the air. Days are longer, sun is shining, flowers blooming and a new SQL Challenge arrives to our community.

Its a nice coincidence. Exactly one year ago, Eddie Awad and I closed the Obfuscated SQL contest, and put the entries up to final vote. I posted an analysis of the entries and Laurent posted a funky photo of me. Eddie and I talked of running another Obfuscated SQL contest this year, but I have to admit that I forgot all about it.

But spring is not really spring without a juicy SQL challenge to whack your brains! So I think the SQL community is lucky that Iggy Fernandez talked both NoCoug and Apress into sponsoring a new SQL challenge.

The challenge is open to everyone. Not just NoCoug members, not just Oracle community. Anyone who can SQL, can enter. Did I mention there are prizes? Apress will give 6 books to the winner, by his choice.

Oh, and Dan Tow is one of the judges! He’s a great SQL expert, so I expect interesting commentary about the problem and the various solutions.

You can read all the rules in the NoCoug website. But the challenge itself is quite simple:

Consider a twenty sided die. Each side of the die has a different number on it. For each side, there is a certain probability that the die will fall on that side. Iggy collected all the probabilities in a table. Now he wants to know what is the probability that he will get certain sums when the die is thrown N times is succession.

Go ahead. Try to solve it. Not as easy as it looks? When Iggy first sent me the problem description, I thought it was impossible. I sent it to few friends, just to annoy them. And turned out that it is far from impossible. In the process, the challenge description was leaked, and we have few terrific solutions already: Rob Van-Wijk used Model, Laurent used connect-by and Vadim used 11gR2!

Iggy did not publish his own solution (and he can’t participate anyway since he is one of the judges), but to tease me a bit, he sent me a graph with the time it took his code to run for different numbers of dice throws. He has O(N^2) complexity, not exponential! From what I’ve seen not one of the published solutions is even close!

probabilities

I hope for a lot of participants. I hope someone finds a solution that is at least as fast as Iggy’s, because he’s been gloating about it for two weeks now. I also hoping for lots of non-Oracle participants, because I’m curious how different their solutions will be, or maybe SQL is just SQL and all the solutions will look the same.


23 Comments on “First International NoCoug SQL Challenge”

  1. the rule discourages use of non-standard SQL, so I will need to find something else than CONNECT BY without prior or XQUERY to stay in competition…

    Vadim solution rules… it is extremly fast, concise and brillant!

  2. Iggy Fernandez says:

    The rules don’t prohibit the use of non-standard SQL. Scalability is also a consideration. The only solutions that are explicitly ineligible are those that use procedural loops to multiply probabilities.

  3. I see that “The competition will close at a time determined by the organizers”; is it possible to know at least the order of magnitude of the difference between the competition closing date and now ? Thanks!

  4. Iggy Fernandez says:

    Hi, Alberto,

    The tentative date is June 30; that is, three months from now. To keep up the interest, we’re thinking of publishing the solutions on the NoCOUG website as they come in. Those who solve the problem might also post their solutions on their own blogs as have Laurent Schneider, Rob van Wijk, and Vadim Trophashko.

    Iggy

  5. Iggy Fernandez says:

    The PostgreSQL folks have come up with a solution that uses the PostgreSQL equivalent of Oracle’s table functions. See http://hype-free.blogspot.com/2009/04/nocoug-sql-challenge.html.

  6. Here is my O( N log N ) solution (FFT using the Model clause):

    http://www.adellera.it/investigations/nocoug_challenge/index.html

    Hope you’ll enjoy it 🙂

    • prodlife says:

      @Alberto
      Oh, wow.
      I hope the judges will not deduct points for not understanding the solution!
      I’ll try to spare few hours on the weekend refreshing my memory of FFT with the CMLab pages. It does look interesting, and O(NLogN) is impressive!
      I also enjoyed your detailed analysis of the solution.

  7. Indeed, it is a very interesting problem – I have really enjoyed but the statistical analysis and the programming part, I’ve had a lot of fun 🙂

  8. Iggy Fernandez says:

    Alberto has actually published two solutions. The first solution uses common table expressions and will work with most databases–including SQL Server and DB2–after a little tweaking. See adellera_FT.sql in the zip file provided by Alberto at http://www.adellera.it/investigations/nocoug_challenge/nocoug_challenge_supporting_material.zip. The second solution provided by Alberto uses the Model clause to implement a computational technique called the “fast Fourier transform” but is harder to read than the first solution.

    Alberto recognized that the contents of the die table define a mathematical function and that the process of joining the table with itself and grouping the results is the so-called “convolution” of this function with itself. Throwing the die N times is therefore equivalent to performing N – 1 convolutions.

    If N=2, we have to perform one convolution as follows.

    SELECT face_value,
    SUM (probability) AS probability
    FROM (SELECT d1.face_value + d2.face_value AS face_value,
    d1.probability * d2.probability AS probability
    FROM die d1,
    die d2)
    GROUP BY face_value;

    If N=3, we have to perform two convolutions. This is best expressed using common table expressions as follows. Notice that the definition of the second convolution references the first convolution.

    WITH

    first_convolution AS

    (
    SELECT face_value,
    SUM (probability) AS probability
    FROM (SELECT d1.face_value + d2.face_value AS face_value,
    d1.probability * d2.probability AS probability
    FROM die d1,
    die d2)
    GROUP BY face_value
    ),

    second_convolution AS

    (
    SELECT face_value,
    SUM (probability) AS probability
    FROM (SELECT d1.face_value + d2.face_value AS face_value,
    d1.probability * d2.probability AS probability
    FROM first_convolution d1,
    die d2)
    GROUP BY face_value
    ),

    SELECT *
    FROM second_convolution;

    The most common solution of the problem requires an N-way Cartesian join. Convolutions have obvious advantages over an N-way Cartesian join because they keeps the size of intermediate results in check. The question is how to compute the required N convolutions with a single SQL statement if the value of N is not known in advance. One solution is to invoke a “table function” recursively as was done at http://hype-free.blogspot.com/2009/04/nocoug-sql-challenge.html; that is, we have to resort to procedural programming. The drawback of that approach is that recursion is fairly expensive. I believe that Oracle limits the depth of recursion to 50.

    Alberto was able to avoid procedural programming using a “Fourier transform;” that is, a certain function whose definition is derived from the original function. The Fourier transform has the property that the transform of the convolution of functions is the simple product of the individual transforms. Therefore, the Fourier transform of the N-way convolution of our function with itself is the n-th power of the Fourier transform of the function. The Fourier transform in our case is easy to compute and the n-th power of the Fourier transform is also easy to compute. At this point, Alberto has the Fourier transform of the N-way convolution instead of the N-way convolution itself. To obtain the N-way convolution itself, he computes the “inverse Fourier transform” of the Fourier transform that he has already derived.

    Fourier transforms and inverse Fourier transforms are explained at http://mathworld.wolfram.com/DiscreteFourierTransform.html. You will still have to work hard to understand the translation into SQL, including the calculation of real and imaginary components and Cartesian and Polar forms.

    P.S. Seven solutions of the problem have now been published.

    Laurent Schneider’s solution using CONNECT BY and XMLQUERY: http://www.amazon.com/gp/blog/post/PLNKI2MYB0YCYAUL/#Mx1E6DR4VGI3O9Z

    Craig Martin’s variation using CONNECT BY and logarithms: http://www.amazon.com/gp/blog/post/PLNKI2MYB0YCYAUL/#Mx26NK96OBG2FLQ

    Rob van Wijk’s solution using the MODEL clause: http://rwijk.blogspot.com/2009/03/calculating-probabilities-with-n-throws.html

    Vadim Tropashko’s solution using Common Table Expressions: http://vadimtropashko.wordpress.com/2009/03/25/variable-number-of-joins/

    PostgreSQL solution using table functions: http://hype-free.blogspot.com/2009/04/nocoug-sql-challenge.html

    Alberto Dell’Era’s solutions using Fourier transforms and fast Fourier transforms: http://www.adellera.it/investigations/nocoug_challenge/index.html.

    The competition has spawned a debate on the merits of SQL v/s PL/SQL.

    Rob van Wijk: http://rwijk.blogspot.com/2009/03/choosing-between-sql-and-plsql.html
    Chen Shapira: https://prodlife.wordpress.com/2009/03/20/would-you-rather-maintain-sql-or-plsql/
    Laurent Schneider: http://laurentschneider.com/wordpress/2009/03/to-sql-or-to-plsql.html
    H.Tonguç Yýlmaz: http://tonguc.wordpress.com/2009/03/21/hot-discussion-sql-or-plsql/

  9. Fabien Contaminard says:

    I did send a mail at “SQLChallenge@nocourg.org” but today i received a Mail Delivery Subsystem with a DNS Error: Domain name not found.

    How should i proceed to submit my non-scalable but not seen solution ?

  10. […] also brings news of the First International NoCoug SQL Challenge. Although the contest is still open, Rob van Wijk thinks he may have found the winner in Alberto […]

  11. Fabien Contaminard says:

    Yeah, now i feel really dumb 😀
    Thanks.

  12. Iggy Fernandez says:

    Thanks, Fabien. We have received your solutions.

  13. Iggy Fernandez says:

    Fabien Contaminard has found a polynomial-time solution based on the multinomial probability distribution. Refer to http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_multinomial_distri.htm for an explanation. The solution is for a six-sided die but can be easily generalized to the case of a twenty-sided die.

    WITH

    — Compute factorials

    factorial AS

    (
    SELECT 0 AS l,
    1 AS f
    FROM DUAL
    UNION ALL
    SELECT LEVEL AS l,
    ROUND( EXP( SUM (LN (LEVEL)) OVER (ORDER BY LEVEL ASC))) AS f
    FROM DUAL
    CONNECT BY LEVEL <= :n
    ),

    — Compute multinomial coefficients

    multinomial AS

    (
    SELECT f1.l as l1,
    f2.l as l2,
    f3.l as l3,
    f4.l as l4,
    f5.l as l5,
    f6.l as l6,
    f7.f / (f1.f * f2.f * f3.f * f4.f * f5.f * f6.f) AS m
    FROM factorial f1
    CROSS JOIN factorial f2
    CROSS JOIN factorial f3
    CROSS JOIN factorial f4
    CROSS JOIN factorial f5
    CROSS JOIN factorial f6
    CROSS JOIN (SELECT f FROM factorial WHERE l = :n) f7
    WHERE f1.l + f2.l <= :n
    AND f1.l + f2.l + f3.l <= :n
    AND f1.l + f2.l + f3.l + f4.l <= :n
    AND f1.l + f2.l + f3.l + f4.l + f5.l <= :n
    AND f1.l + f2.l + f3.l + f4.l + f5.l + f6.l = :n
    )

    — Compute probabilities

    SELECT face_value,
    SUM (probability) AS probability
    FROM (SELECT d1.face_value * m.l1 + d2.face_value * m.l2 + d3.face_value * m.l3 + d4.face_value * m.l4 + d5.face_value * m.l5 + d6.face_value * m.l6 AS face_value,
    POWER (d1.probability, m.l1) * POWER (d2.probability, m.l2) * POWER (d3.probability, m.l3) * POWER (d4.probability, m.l4) * POWER (d5.probability, m.l5) * POWER (d6.probability, m.l6) * m.m AS probability
    FROM multinomial m
    CROSS JOIN (SELECT face_value, probability FROM die WHERE face_id = 1) d1
    CROSS JOIN (SELECT face_value, probability FROM die WHERE face_id = 2) d2
    CROSS JOIN (SELECT face_value, probability FROM die WHERE face_id = 3) d3
    CROSS JOIN (SELECT face_value, probability FROM die WHERE face_id = 4) d4
    CROSS JOIN (SELECT face_value, probability FROM die WHERE face_id = 5) d5
    CROSS JOIN (SELECT face_value, probability FROM die WHERE face_id = 6) d6)
    GROUP BY face_value
    ORDER BY 1;

  14. Singer Wang says:

    I’m still thinking ‘Central Limit Theorem’. So Model + CLT (n > ~50) and we have a algorithm that’s O(N).

  15. […] my solution about NoCoug first SQL Challenge. You’ll find some very very nice stuff into the comments of this post, published by Iggy […]

  16. […] was my solution about NoCoug first SQL Challenge. You’ll find very very nice stuff into the comments of this post send by Iggy […]

  17. […] I was recently randomly browsing Laurent Schneider’s blog, and saw a link about a riddle to be solved in plain SQL. The challenge is clear, and very shiny solutions were already found by Laurent Schneider, Craig Martin, Rob van Wijk and Vadim Tropashko. You can find links to these solutions on the comments of Chen Shapira related topic. […]

  18. […] mi è ben chiaro perché, ma buona parte degli aggiornamenti sono stati discussi nei commenti al post sulla notizia del concorso sul blog di Chen Shapira. Ad esempio li Iggy spiega che il termine del concorso è al 30 […]

  19. […] channelUnexpected Side Effect of Table ShrinkingAboutTroubleshooting Broken ClusterwarePresentationsFirst International NoCoug SQL Challengealter table … shrink spaceShrinking Tables Once […]

  20. […] The First International NoCOUG SQL Challenge sponsored by Apress is now closed. Thanks to Chen Shapira for helping to publicize the challenge. The winner will be announced in the next issue of the […]


Leave a reply to Well, It’s About Time ! at Waldar’s SQLing and Datawarehousing Place Cancel reply