C developers with SQL code

Jeff Hunter posted an amusing story about how he optimized a query and what the developer did with his suggestions. The story is so cool not because it is one of these horrible WTF stories that are unlikely to happen in any but the most moronic IT organizations, but because this is a rather harmless oddity that happens all the time.

Most developers started their career by studying CS. This means about three years of writing code in Pascal, C, Java or whatever language CS departments prefer these days. After three years of C development you think of every problem in terms of arrays and loops, after three years of Java you think about objects and patterns. You are not even aware that you are doing this, but everything you look at instantly becomes an array with a loop, or a virtual private class.

At least in my days, databases were a highly optional course in the CS department, so the new developer on the team is probably seeing SQL for the first time in her life, after years of converting problems to arrays and loops. Unfortunately, a database table looks exactly the same as an array. The developer’s automatic response is to write a loop that goes over the array. Imagine her horror when she discovers that SQL doesn’t do loops. With few exceptions, even after learning SQL, the developer  will continue thinking of it in terms of loops and arrays. For the procedural developer, everything is a full table scan and nested joins.

At this point, his code will do funny things like fetch entire tables and iterate over the results to count the number of rows. Maybe not that funny, but close enough. In a good development organization, there will be code reviews and his manager or peers will turn him to the path of efficient database development.

In my experience, it is best that if at this fragile stage, no one will tell the developer about PL/SQL. The longer the developer has to work only with set operations within the database,  the more he will get the ideas of database developement and the better DBA he will become. Developers need to be broken of their loop addiction early and brutally. If they find out about PL/SQL too soon, they will always have the crutch of going back to their loops and will never really get DBs, or at least the process will be much delayed.

DBAs working with development teams really need to take care in educating their developers. This is really preventing performance issues before they are even written, and it is much easier to solve problems at this stage.

Advertisements

4 Comments on “C developers with SQL code”

  1. Gary says:

    So true, and also for mainframe developers coming to databases for the first time.

  2. Sidhu says:

    Totally agree…

    Sometimes, the freshers coming after their degrees are not even aware about database, SQL and all this stuff. & I have seen many C & Java programmers also, writing procedures of hundereds of lines but without handling even a single exception

    Sidhu

  3. Adi Stav says:

    That’s an interesting observation. I’d noticed that in non-database contexts, too; often UNIX shell code wants to be written in such a non-loopy way. Like so:

    find . -name ‘*.pyc’ |
    sed ‘s/\.pyc$/\.py/’ |
    xargs rm -f

    Instead of the loopy:

    for i in `find . -name ‘*.pyc’`
    do
    rm -f `echo $i | sed ‘s/\.pyc$/\.py/’`
    done

    Which takes ages! And written by an extremely capable colleague, too. I was appalled.

    It’s simply functional programming. I guess what makes shell and SQL alike in this aspect, is that both languages specialize is transferring work to an external process, where the external process is efficient but the transfer is the hog. You *have* to specialize in specifying streams and relations in advance, or you spent all your time running back and forth with very little data.

  4. prodlife says:

    Hi Adi,

    An interesting observation, especially because a year ago I happened to see an old Unix guru write shell scripts that ran queries on the database and did some things with the results.

    He was doing perfectly fine as long as he could pretend that tables were files and he was using AWK to get specific rows and columns. The “relational” part of relational databases – the fact that different tables are related to each other in specific ways and your code is more efficient when you use these relations, that didn’t map well into the “SQL is AWK” model, and caused some interesting code to be written. Think about how you’ll implement a “join” in a shell script, and you get some ideas of the code I reviewed.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s