Just Return Any Random Row

Sometime I see a developer try to run something like “select deptno,sal from emp group by deptno”. I usually ask the developer “But employees in the department have many different salaries. Which one do you want?”, and sometimes I get an amazing answer: “I don’t really care, I just want to see any random salary”. Usually, I tell her to aggregate by “max” or something similar, if she doesn’t care about the result.

But today I was really annoyed. So I wrote a custom aggregation function that will return a random salary.  I admit, it is not as useful as str_agg, but there seem to be few developers who are interested in this feature.

The main challenge was to make the aggregation truly random. When you aggregate, you always have the current aggregation value and a new one. I have to randomly choose one of them – but I can’t make it a simple 50/50 selection.

Suppose I have three rows. The way aggregation works, I first take two rows and flip a coin to pick one. Now I have a current value – and I have to take the third row and decide if I want to keep the current value or the new one. I can’t flip the coin again – because if the third row has 50% chance to be selected, this means the first and second rows only have 25% chance each. Not fair. So I need to give the third row 1/3 chance, and the current value 2/3.

create or replace type agg_t as object (

    curr_value number,
    running_count number,

    static function ODCIAggregateInitialize(sctx  in out agg_t)
                    return number,

    member function ODCIAggregateIterate   (self  in out agg_t,
                                            new_value in number)
                    return number,

    member function ODCIAggregateTerminate (self         in     agg_t   ,
                                            return_value    out number,
                                            flags        in number      )
                    return number,

    member function ODCIAggregateMerge(self in out agg_t,
                                       ctx2 in agg_t    )
                    return number
);
/

create or replace type body agg_t is 

    static function ODCIAggregateInitialize(sctx in out agg_t)
        return number is
    begin
        sctx := agg_t(null,0);
        return ODCIConst.Success;
    end;

    member function ODCIAggregateIterate(
      self in out agg_t, new_value in number)
        return number is
    begin
        running_count:=running_count+1;
        if (mod(dbms_random.random,running_count)=0) then
        	curr_value:=new_value;
        end if;
        return ODCIConst.Success;
    end;

    member function ODCIAggregateTerminate(self in agg_t,
        return_value out number, flags in number) return number is
    begin
        return_value := curr_value;
        return ODCIConst.Success;
    end;

    member function ODCIAggregateMerge(self in out agg_t,
        ctx2 in agg_t) return number is
    begin

    	if (mod(dbms_random.random,running_count+ctx2.running_count)<ctx2.running_count) then
    		curr_value:=ctx2.curr_value;
    	end if;
        running_count:=running_count+ctx2.running_count;
        return ODCIConst.Success;
    end;
end;
/

create or replace function agg_random (input number) return number
    parallel_enable aggregate using agg_t;
/
&#91;/sourcecode&#93;

So easy! I love user defined aggregations! But you have to be careful when writing them. I accidentally replaced "number" with "varchar" somewhere in the code and got a lovely error message when I tried to run it:
<pre>SQL&gt; select deptno,agg_random(sal) from emp group by deptno;
select deptno,agg_random(sal) from emp group by deptno
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 2104
Session ID: 138 Serial number: 1496</pre>
And in the alert log: <em>Exception [type: ACCESS_VIOLATION, UNABLE_TO_WRITE] [ADDR:0x0] [PC:0x3D31168, ___intel_new_memcpy()+40]</em>
I segfaulted! and you can imagine how fun it was to debug my pl/sql code using the trace file...

Anyway, after all the debugging is done, its time to show the code to my developer:


SQL> select deptno,agg_random(sal) from emp group by deptno;

    DEPTNO AGG_RANDOM(SAL)
---------- ---------------
        10            2450
        20            2975
        30             950

SQL> select deptno,agg_random(sal) from emp group by deptno;

    DEPTNO AGG_RANDOM(SAL)
---------- ---------------
        10            1300
        20            2975
        30            1600

SQL> select deptno,agg_random(sal) from emp group by deptno;

    DEPTNO AGG_RANDOM(SAL)
---------- ---------------
        10            2450
        20            2975
        30             950

SQL> select deptno,agg_random(sal) from emp group by deptno;

    DEPTNO AGG_RANDOM(SAL)
---------- ---------------
        10            2450
        20            3000
        30            2850

But I still wasn’t happy. What if I have a bug and the selection is not random enough? After all, the values in the example seems a bit repetitive.

No problem. I’ll use my function on a table containing numbers 1 to 100, pick few random numbers, and then use the well known chi-square test to check if my random selection matches the uniform distribution.

SQL> create table r1 as select level l from dual connect by level<100;

Table created.

SQL> create table r2 (n number);

Table created.

SQL> insert into r2 select agg_random(l) from r1 group by ();

1 row created.

-- repeated 20 times

SQL> select count(*) from r2;

  COUNT(*)
----------
        20
        
        
SQL> DECLARE
  2   sig   NUMBER;
  3   alpha NUMBER := 1;
  4   beta  NUMBER := 99;
  5   ttype VARCHAR2(20) := 'CHI_SQUARED';
  6  BEGIN
  7    dbms_stat_funcs.uniform_dist_fit('SCOTT', 'R2', 'N', 'DISCRETE', ttype, a
lpha, beta, sig);
  8
  9    dbms_output.put_line(sig);
 10  END;
 11  /
X-squared value : .9000000000000000000000000000000000000026 
Degree of freedom : 18.0000000013919621366679664

PL/SQL procedure successfully completed.

90% probability of getting my result assuming the function is good. I’m so happy. I knew that getting a degree in statistics was not a complete waste of time 🙂


Looking for great PL/SQL

Steve Yegge is a developer who writes way too much. Each one of his blog posts can easily take an hour to read, but they tend to contain nice gems that make the long read worth the time.

Today I spent an hour reading about 50% of a fairly interesting article about how to practice programming. His main point is that just programming on the job will not make you a better programmer. You have to actively strive to improve. (I just saved you an hour of reading!). One of his suggestions on how to become a better programmer is this:

Read through someone else’s code for 20 minutes. For this drill, alternate between reading great code and reading bad code; they’re both instructive. If you’re not sure of the difference, ask a programmer you respect to show you examples of each. Show the code you read to someone else, and see what they think of it.

Sounds like a good advice. Now I only have to find great and terrible code examples. Well, if I want to practice C, this is not an issue – for great code examples I read Linux Kernel code, and for awful examples I read code that I wrote 7 years ago (and it looks worse every year).

But for PL/SQL , I’m a bit stuck. I can still read my own code for bad examples, but where can I find examples for great code? My colleagues write nice enough code, but we have our own coding style and standards, and if something is wrong with these, it’ll be difficult to tell without outside input.  Somehow, there is simply no open-source code written in PL/SQL that I can read to get a good idea of how PL/SQL should be written.

The closest I’ve found is the code examples in Steven Feuerstein’s PL/SQL book. These have been extremely helpfull in teaching me how to handle exceptions, iterate through cursors, work with associative arrays, etc. But these are just examples, I can’t learn how to cleanly build a fairly large working module from these great examples. Its nothing like reading through the Linux kernel.

I think the issue is somewhat related to the problem Tyler Muth raised about lack of reusable PL/SQL modules. Oracle has an amazing community, but for some reason we don’t share our code and we don’t participate in open-source community projects. Most of our community efforts revolve around helping each other learn new features and solve performance issues.

We are probably missing an opportunity to help each other resolve performance problems before they occur.


Why Application Developers Think Differently Than DB Developers

It is all about programming practices. Very early in their training application developers learn three important paradigms:

  1. Generality – Solutions should work in as many new situations as possible
  2. Premature optimization is the root of all evil
  3. Goto considered harmful

These are very good programming practices that work great in many development situations. Except when it comes to databases. With databases, the first two practices are trouble. Here’s how:

  1. Application developers want their code to be very general and adaptable. So, they can’t depend on any vendor specific DB features. This mostly means that the developers will be forced to reinvent the wheel on every step of the way, or write a huge, expansive database layer, or even worse – use Hibernate.
  2. More generality: Often application developers think that the idea of having separate tables for different conceptual objects and different columns for specific attributes of these objects is a form of evil hard coding. Their solution will be to use few “generic” tables, like this one:
    create table things (thing_id number,
    thing_name varchar(4000),
    thing_value varchar(4000),
    thing_data_type_id number);

    Or sometimes, they will add columns named userdata_1 … userdata_n to an otherwise well designed table. Just in case.
  3. Of course, if someone will point out that their design will never scale and will cause the application will run very slowly, the application developer will scoff “This is premature optimization! Knuth said it is evil! If it will be slow we will get the DBA to add few indexes and buy more memory”.

DB developers have a different view of things. For us, performance and scalability are not features number 18 and 27 on our list. For DBs performance and scalability is everything. There is no other reason to even have a DB, not to mention an expensive one like Oracle.

No optimization is ever premature on a database, and we prefer to start optimizing when we do the design and modeling. The question “which way is faster” is usually asked when we try to decide between several design ideas or different ways to write the same query. On the other hand, we optimize on an entire different level than C developers, and only very rarely resort to inlining functions and embedding assembly code.

DB developers remember very well that if the data model is good, all other optimizations are much easier. Often great queries will simply write themselves on a good model, but even when they don’t, it is easy to see which indexes should be added and how queries should be tuned when the underlying model is clear.

We also know that DBs work best on specifics. The more information about your data the DB knows, the better you can get it to perform, and also the better you can guard the consistency of your data. While application developers like to “generalizing” their programs by pretending they don’t have information that they actually have, DB developers prefer to use every piece of information they can get their hands on, and then force more information and promises from their customers.

Good DB developers are vendor specific. They know one database very well, they know all the features, limitations, concepts and behaviors. They know how to make it run very fast and how to avoid problems that can only happen on this specific DB. They can tune it to death and get maximum value out of every last feature the vendor put in. Being good engineers, we will not let this knowledge go to waste.

No wonder it is very difficult for application developers and DB developers to get along!
What the application developer considers a best practice, DB developers call a complete waste of resources and a hopelessly slow application. When DB developers talk about their best practices, application developers think this is the root of all evil.

Of course, I didn’t do justice to the topics I covered. Tom Kyte has several best selling books about how to approach database development and design. But I think I had a new insight about the source of the problem: Application developers are not stupid, they just had different indoctrination. Thought it is worth sharing.


Exceptions Should Not Contain Business Logic

Steven Feuerstein’s PL/SQL book has a remarkably good chapter about collections. In addition to showing the different kinds of collections, their methods and how to use them, he also shows some cool non-trivial uses for collections, a taste of the great things you can do.
Few of the examples involve the use of collections for caching. Associative arrays are perfect for caching, and caching is a useful programing technique. One of the examples is as follows (The code is available on Oreilly’s site, so I don’t think I’m breaking copyright here):

   FUNCTION description (code_in IN hairstyles.code%TYPE)
      RETURN hairstyles.description%TYPE
   IS
      return_value   hairstyles.description%TYPE;

      FUNCTION desc_from_database RETURN hairstyles.description%TYPE
      IS
         CURSOR desc_cur IS
            SELECT description FROM hairstyles WHERE code = code_in;
         desc_rec   desc_cur%ROWTYPE;
      BEGIN
         OPEN desc_cur;
         FETCH desc_cur INTO desc_rec;
         RETURN desc_rec.description;
      END;
   BEGIN
      RETURN descriptions (code_in);
   EXCEPTION
      WHEN NO_DATA_FOUND
	  THEN
         descriptions (code_in) := desc_from_database;
         RETURN descriptions (code_in);
   END;

This code is short, simple to understand, can be adapted for many different uses, and is an excellent demonstration of the power of associative arrays.

I have just one problem with this code: Most of its logic is implemented inside the exception clause.

Technically speaking, exceptions are very similar to “goto” and historically goto has been used for exactly the same purpose – get out of the current block when things have gone wrong. Exceptions (and goto) make the code easier to read by putting all the clean up work in one place and jumping to this one place whenever needed, but other uses of exceptions can make the code more confusing.

The development standards I’m working with specify that exceptions should be used for handling exceptional conditions, not situations that will occur during the normal flow of the program. The advantage of this standard is that when someone reads the code, she can understand the main logic without jumping back and forth between different blocks. Then when she gets to the exception block she can see which problems you anticipated and how you dealt with those.
How do we know if we did this right? We look at the code and ask ourselves what would happen if we deleted the exception block and then ran it in ideal circumstances (enough memory, disk space, smart users, etc) – would it behave the same from business logic perspective? It should.

So, I would rewrite the function above as follows:

BEGIN
    If  (not descriptions.exists(code_in)) then
        descriptions (code_in) := desc_from_database;
    End if;
    RETURN descriptions (code_in);
END;

I’m not saying that Steven’s original example is confusing. It is not. I enjoyed the smart use of exceptions. But I know from bitter experience that examples have ways of propagating, and beginners that see this example in the book may copy this bit without thinking about the finer points of code readability.

If you are not sure what belongs in the exception clause and what doesn’t, keep your business logic out of it. Even if an example in your favorite book is doing the opposite. 99% of the time, your code will be more readable.


From C to PL/SQL

The most dangerous assumptions are those that you are not aware of. We all know we should question assumptions and verify everything, but how can you question the assumptions you don’t even know you have?

C developers who move to PL/SQL development should constantly be aware that they are not in AT&T labs anymore. Of course, constant awareness is easier said than done, and Oracle keeps surprising.

It keeps surprising even after we learned that in Oracle, integers are not really integers, and are not faster than decimals. Unless they are binary_integers. Binary_integers sounds nice and comfortable, C developers love binary – after all C is just portable assembly. Don’t get your hopes up though – you won’t find binary shift operators in the language. If you ask your co-workers about the missing binary operations, you are in for a lecture. Database people have their own optimization methodologies, and they only worry about bits and bytes when they count IO operations.

How to recognize a C developer who now dabbles in PL/SQL? Ask him how many times will the following code print “… running very slow calculation …”:

create function expr return number
is
begin
dbms_output.putline('... running very slow calculation ...');
return 10;
end;

begin
for i in 1..expr() loop
insert into t values (i);
end loop;
end;

A C developer will say “ten times”, and will insist on “optimizing” the loop and write:

declare
e number;
begin
e:=expr();
for i in 1..e loop
insert into t values (i);
end loop;
end;

PL/SQL developers will know that loop boundaries are calculated once, when you first enter the loop. Therefore the line will be printed exactly once, and the second version is not faster than the first.

As they said in Pulp Fiction:
“It’s the little differences. I mean they got the same shit over there that they got here, but it’s just – it’s just there it’s a little different.”

***
Another identifying trait of former C developers is that they are forever interested in low level internals, and particularly memory management.
Lucky for me, Christo Kutrovsky published a paper about this exact topic. It is the perfect paper – crystal clear, gets into all the interesting and important details, no attempts to dumb down anything, and it even has practical conclusions.


Exception Safety for PL/SQL

One of the things I like about C++, is that there are tons of theory around the language. There are many smart people who sit around and write articles about how different language mechanisms are best used. We have a bit of it in PL/SQL, but not nearly enough.

One of the most important articles about exception safety in C++ is Abraham’s Exception Safety in Generic Components. 90% of it is totally irrelevant to PL/SQL developers because the smart people from Oracle chose not to inflict templates and generic components on us (at least yet, who knows what will happen in the next version). However, in this article Abrahams defines the different levels of exception safety that a code component can achieve, and this directly relates to PL/SQL developments.

The levels are described in terms of a PL/SQL block that is called by another, and what can the caller expect after the called block has thrown an exception when each level is implemented.

Exception safety levels, as I translate them from C++ to PL/SQL:

  • No safety: You don’t handle exceptions. When your code breaks you have no idea what is the state of your structures and data, and all kind of resources like cursors may be left open. Note that you can achieve this state by completely ignoring the possibility of errors, or by using “when others null”. Be glad that you are developing in PL/SQL and not in C++, because at least Oracle keeps some semblance of data consistency, and keeps you from memory leaks and crashes.
  • Basic Safety: You take care to close cursors or other resources when exception occurs, and that you leave the system in a valid state. Raise_application_error functions does a bit of this – cleanup of out variables if relevant. Note that you still have no idea what the block actually did – inserting one row out of three is a valid state, so are unknown changes to variables in the calling block. In fact, in PL/SQL there is very little difference between basic safety and no safety – because Oracle works hard to make sure that truly invalid states don’t occur even if you do no exception handling at all. But they can happen if you have integrity rules that you expect your code to keep and are not modeled into the database (foreign keys and similar constraints are a good example for this – if you didn’t model them into the tables, you will need to take care of them as part of basic exception safety).
  • Strong Safety: This what we usually aim for in exception handling – complete success or full rollback. Either the block did what it was supposed to do, or it left the system as if it was never called, leaving the caller with all the options open when deciding how to proceed. It is difficult to achieve this state in C++, but in PL/SQL it should be almost trivial – scope your transaction correctly, so Oracle’s built in rollback can take of most of the cleanup for you, release all opened resources, clean up changes to global variables (this can be tricky – so avoid using global variables), and you should be good.
  • No-throw Safety: The operation will always succeed and never throw an exception. Usually both impossible and unnecessary. Except maybe for code you use in exception handling and similar cleanup operations.

Knowing all this may not cause you to write safer code, but hopefully you can at least think about how unsafe is the code you write, and what it will mean to those who use it.

***

If you are looking for something nice to read on the weekend, here are my recommendations:

Charles Schultz has a DB with high hopes for Christmas.
Jon Emmons explains about holiday decorations.
Dizwell released Doris, the queen of Oracle-on-Linux installations.


Logs and other Instrumentation tools

Attention All Application Developers:

One day, your application is going to be deployed in production.
We both know, that even when that fateful day arrives, the application may still be full of bugs and perhaps even a performance bottleneck or two.
And sooner or later the operations team that owns the production system will notice the bugs and bottlenecks, and they will call you. They will expect you to be able to diagnose the cause of the problem and depending on the severity of the issue either suggest a work-around, issue an emergency fix or implement a solution in the next version.
They will also expect you to do all that without interfering with production work. This means that they will not allow you to upload new executables onto production. They are not doing this to make your life miserable, they are doing this because they are getting payed to protect the production system from interruptions.

If you don’t want to look like an idiot at that point, you should prepare your application for live debugging.
A log with error messages that you can understand is a terrific start. It is even better when log messages are dated. If your application is multi threading, you want to know which thread wrote which message to the log.
A nice switch that causes the application to start writing incredibly detailed messages to the log, usually known as “debug” or “trace” switch, will be highly useful and well worth the 2 hours it may take you to write one.  Make sure your application polls this switch – the operation people will be unhappy if you tell them you need to restart production to generate a trace.
After this, the sky is the limit – write key performance events to a table, allow attaching a client that will let you look at the memory of the application while it is running, secret APIs for turning on even more detailed debugging messages – I’ve seen them all, and they are all very useful.

OracleNerd gets it.


Debugging SQL

I love SQL. Let me count the ways:

  1. It really fits the relational database model. You only have  to work with Hibernate for few days to appreciate the beautiful simplicity of SQL.
  2. SQL doesn’t have direct memory access. Which means that I don’t have all those freaky  bugs that C developers have, where you write random chunks of memory by mistake and get totally random results that are impossible to debug.
  3. Even more important – the concurrency model is relatively simple and as a bonus the DB will detect and resolve deadlocks for you.
  4. It is easy to read. There are no “obfuscated SQL” contests for a reason.
  5. Eye on Oracle got few more reasons, just in case you need convincing.

However, one of the things I really wish I had is a nice SQL debugger, because I’m tired of picking apart complicated SQL statements by hand just to find the bad join that causes duplicate rows or missing rows or whatever.

I realize that debugging SQL is not as trivial as it would be for C, because an SQL statement does not contain a list of well defined operations that one can step though one by one while watching key variables.  But there are a bunch of logical set operations that make up each query. Even if Oracle’s implementation of them can get complicated – an “exists in” is logically a nested loop, join is a set product which can then get filtered, etc.

So, why not build a tool that allows you to walk through this operation. The tool should allow you to see the data set after each operation. This way it should be much easier to find the join that breaks everything. After all, we go through a very similar process every time we try to debug a query. It is clear that we need a tool that will support the task that we do so many times a day.


Best Practices can be Dangerous

Secret Geek posted about a specific behavior very common among pretty much all technical folks: The tendency to turn good advice into gospel.

He gives the example of having one exit point per function, advice that was important in the past  and lost its importance in some of the newer languages.

There are tons of such “best practices” for developers: Goto are considered evil, multiple inheritance is scary and global variables will cause the western hemisphere to sink into the ocean. All of the above represent very good practices. However, sometimes, when you decide to ignore this advice in favor of other considerations such as code readability, you risk running into opposition that cannot rationally argue the pros and cons of each approach and just keeps quoting the best practice “… but you can’t use a GOTO! It is EVIL!”.

DBAs have their own best practices, of course. Buffer Cache Hit Ratio used to be everyone’s favorite, but now it was so widely discredited that it cannot be used as an example anymore. At least I hope so.

Sometimes working in an organization, you run into some very strange configurations that can only be a result of a best practice, the justification of the practice was long forgotten, but the implementation lives on because no one understands the matter enough (or cares enough) to try and change the configuration.

One such configuration I’ve seen is mounting a single NFS volume to multiple mount points and configuring the DB to use them as though they were separate volumes (i.e. put different data files on different mount points). I’m not sure what effect it was supposed to achieve – maybe better stripping? I do know that it was considered a best practice at some system at some point in time, and no one knows if it is still relevant.

If I were Kevin Closson, I would have figured out the matter in about half a day of work and few well aimed benchmarks. I’m not as good with benchmarks, and I’ve never taken the time to get to the root of the matter, so I have to wonder how everyone else can be so sure of performance benefits from this configuration when they never tested the alternative, and can’t even theoretically explain what can cause such benefits.

And something completely different: Alex Gorbachev of Pythian is having lots of fun at Miracle Open World conference, and he writes about it in the Pythian blog. Boy, does it sound like a great conference.


Perl or PL/SQL?

It happens every three month – the project managers need to generate their quarterly report. Every three month they decide that the reports they used until now are not good enough and they come up with a new design for the report. The DBAs now need to get the data, preferably yesterday. After we gather the data and create the report, there is another week during which we try to convince them that the results are correct and they try to convince us to massage the data a bit to reflect the results they expected.

This time, to generate the report I need to collect data from six different databases which  together contain about 30 schema, one of them is SQL Server. Sounds like fun already.

Naturally, I want to write a script that will run all the necessary queries, do all the necessary calculations and aggregations, and dump everything into a table. Then I’ll grab the results from that table and put them into a nice looking Excel (with graphs and colors!).

My only dilemma is: Perl or PL/SQL?

Why Perl?

  1. I know Perl much better than PL/SQL, so I’ll get it written fast and more correctly.
  2. Perl has no problem connecting to SQL Server – I’m kind of worried it may be an issue with PL/SQL.
  3. If I do the report in Perl, I won’t have to create all those linked servers to all the other systems. Our security team dislikes linked servers.

Why PL/SQL?

  1. Well, learning PL/SQL was one of my objectives this year – so this is a great opportunity.
  2. It will probably run much faster. I can’t be sure about it at this stage, but this is usually true.
  3. Other DBAs can maintain PL/SQL code. I’m the only one who does Perl.

I think that if I’ll find a way to get the SQL Server data from PL/SQL, I’ll go the PL/SQL route. I think its called “heterogeneous server”, but last time I checked it was quite complicated to use. Maybe it is worth a second chance.