Cumulative Distribution

Laurent Schneider sent me a link to an interesting discussion in AskTom.

How do we calculate Cumulative Normal Distribution? This means calculating the probability that a random variable from a normal distribution, will end up equal to or less than X.

I’d love to know why the OP needed that. While Normal Distribution is very popular, most realistic statistical applications would use Student’s T distribution. This is so common that Oracle contains a built in T Test function. Maybe this would be useful to the OP?

Regardless of Tom’s flippant reply, it is actually a difficult question. Knowing the cumulative normal distribution means integrating the normal (Gaussian) function. Which is impossible. So in a way, the correct answer to the OP’s question is “you can’t”.

But thats not correct either, while there is no “general” function for the integral of the normal distribution, you can manually calculate the integral for any specific value on the function. Lucky for us, Gauss had grad students who did this hard work, and until very recently statisticians referred to books with tables containing these results whenever they needed to know the probabilities.

Which brings us to another good answer – find a website with this table, load the table into Oracle, and use it as much as you want. No PL/SQL needed.

Another interesting way to calculate the cumulative normal distribution in Oracle would be to use Oracle’s dbms_random.normal to generate an approximation of the normal distribution and calculate the cumulative distribution on the result. Of course it is not very accurate, but it is a lot of fun:

select n,cume_dist ( ) over (order by n) cume_dist from
(select dbms_random.normal n from dual connect by level<=100);

In the AskTom thread, Georg from Germany linked to a function that approximates the cumulative normal distribution, and that site contained link to a paper with even better approximation. This is probably how statistics software (SPSS and its friends) calculates probabilities.

Interesting question, many good answers.

***

Frits Hoogland also managed to write a long post about a short question with a short answer:
How an ASM diskgroup is found by the database.

About these ads

2 Comments on “Cumulative Distribution”

  1. Chen,

    The query above generates 100 normal random values, sort them and give them a value 0.01 0.02 0.03 … 1.00

    How it then applied to your data?

    select y,z from dual model
    return updated rows
    dimension by (0 x) measures (0 y,0 z,7.67e+183d k)
    (y[for x from 1 to 100 increment 1]=dbms_random.normal,
    z[x>0]=row_number() over (order by y[cv()])/100)
    order by z;

    Well, I have 100 rows, but where are my datas? Any example with EMP or how to apply this table to your dataset would be appreciated.

    th@n:x al0t

  2. [...] — Chen Shapira, just a simple DBA on a complex production system, has some thoughts on cumulative distribution. “How do we calculate Cumulative Normal Distribution? This means calculating the probability [...]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 3,111 other followers