Would You Rather Maintain SQL or PL/SQL?

Rob Van Wijk wrote about the reputation that SQL has as unmaintainable. He claims that SQL is not inherently less maintainable, but rather that developers who say that SQL is difficult to maintain are simply not as skilled in SQL as they are in PL/SQL, and they certainly don’t document their SQL to the same level they document their PL/SQL.

I mostly agree with Rob. It is not a coincidence that developers are more comfortable with PL/SQL than with SQL. PL/SQL is a procedural language. Nearly everyone learned programming with a procedural language, and probably programmed in a procedural language for at least three years before being introduced to SQL.

SQL is a different beast. It is a data maniplation language based (roughly) on the relational model, and most of its operators are based in relational algebra. Quite different from Pascal, C or Java. Its not really surprising that most programmers are never entirely comfortable around SQL and keep being surprised by its behavior.

I think there are few other reasons that SQL is percieved as less maintainable though:

  1. Readability of SQL is heavily dependent on the underlying data model. A good model will lead to more readable and maintainable SQL. When the data model does not match the reports that need to be generated, PL/SQL may have an advantage, its readability being less impacted by the data model.
  2. Differences in how SQL statements are written can lead to differences in performance. It is somewhat true in PL/SQL as well, but I think SQL is more sensitive. Performance often takes priority over readable code. Not using subquery factoring because it results in sub-optimal optimizer decision is one consequence. Of course, the reasons for writing SQL in a specific way are very rarely documented, making the code even less maintainable.
  3. As a result of #2 – developers who will gladly refactor PL/SQL code to make it more readable, will think at least twice before refactoring SQL. It is nearly trivial to predict how a given code change will impact PL/SQL performance, but nearly impossible to do the same for SQL (unless you are Jonathan Lewis).
  4. Debugging is a big issue with SQL. In PL/SQL it is easy to add few put_line messages to track what exactly your code is doing and why. There are even nice commercial debuggers. Finding bugs in SQL is still a bit of a black art that must be done without any tools or methods.

I’m not sure if there are any guidelines on when to prefer SQL and when PL/SQL is better, but at least the discussion is a bit more complete.


18 Comments on “Would You Rather Maintain SQL or PL/SQL?”

  1. chet says:

    I like both but tend towards the SQL side in only the rarest of circumstances (400 line query for example).

    Debugging is a bit more difficult, the only way I do it is through smaller sets.

    Wholeheartedly agree with #1 though, the design is very, very important and will make thing everything downstream that much cleaner. I even wrote about it (generally) recently. http://www.oraclenerd.com/2009/02/coding-is-easy-ii.html

  2. Toon says:

    “A good model will lead to more readable and maintainable SQL”

    Not only that…

    It will also lead to more performant execution plans. The cost based optimizer loves a clean relational database design.

  3. Rob van Wijk says:

    Hi Chen,

    Good post with many valid points. Some comments about the other reasons:

    1. True. Bad data models lead to overly complex queries. But also to more complex PL/SQL code.

    2. Agree.

    3. Isn’t performance the number 1 reason to refactor SQL? So you are probably already using explain plan and tracing to predict how the change will impact performance.

    4. I’ve never found debugging to be a big issue with SQL. Just build the query up from the inside out. Or peel it off like an onion if you have to maintain it. I consider model clause queries an exception here though. If debugging is an issue, then the SQL has probably reached such complexity where even I will have resorted to PL/SQL a long time ago 🙂

    Regards,
    Rob.

  4. prodlife says:

    Hey Rob,

    The “peeling” method works ok until you get to correlated subqueries.

    But I think the problem is that reading SQL makes no sense unless you are familiar with the data model, and the column names and table names make sense, and no one uses funny magic numbers in the query.

    In PL/SQL, probably for the same reasons that lead to more documentation, magic numbers are more rare and variable names are usually nicer.

    Nothing inherent, but many developers take PL/SQL more seriously, and it shows!

  5. Although I do agree with Rob, this discussion feels a bit weird to me. I’d think you’d need to know how to write sql before you write pl/sql. At least if you want to go after actual data in the database, which in my experience is often the case 🙂
    Almost feels like comparing, say, pro*cobol with sql. Or a unix one-liner with a perl or shell script.

  6. […] by H.Tonguç Yılmaz on March 21, 2009 I read Rob, Laurent and Chen on the topic, cool topic. As far as I follow OTN SQL and PL/SQL forum is the place for SQL madness, […]

  7. Hi Chen.

    The biggest advantage that the database has over any other tier is SQL.

    It’s pretty much the only area where the performance differential compared to logic in another tier is so big as to justify a set-based data centric approach and the only hope for nails in the coffin of the row-by-row ORM approach.

    PL/SQL is great. But where you see a bunch of logic in PL/SQL it’s often hard to justify its presence (in environments where the importance of the middle-tier is over-egged) unless it’s primarily set-based operations (generalisation).

    As I go from client-to-client, the biggest performance gains I see are to be made moving to a SQL approach from sometimes-easier-to-follow but inefficient nested PL/SQL lookups. It’s not a trivial exercise though.

    Subquery refactoring can be so useful in a) reliably refactoring someone else’s logic and b) leaving something behind which is maintainable by others.

    Having said that, I’ve seen plenty of cases where the network of PL/SQL calls is as difficult to follow as the most-complicated SQL statement you can imagine.

    But, as you say, the SQL is so dependent on a sensible data model.

    So much code in the database, SQL or PL/SQL, is written by non-database specialists who take for granted or aren’t interested in the key aspects of database design and development. Shame, but keeps many of us in a job.

  8. prodlife says:

    Just to clarify: I was talking about “edge” cases – when the resulting report requires non-trivial calculations in SQL that can also be done in PL/SQL (or Java).

    I do not suggest replacing all SQL with PL/SQL 🙂

  9. chris_c says:

    I tend to find SQL easier to read and maintain I think this is because I started out writting SQL reports for oracle/sql server databases and came quite late to PL/SQL.

    A good data model and a decent techinical specification and design can save a lot of time and effort as well, specifying the reports required before you start designing th eapplication can save a great deal of pain otherwuise you can end up with data that should be easy to query requireing all sorts of hacks to make it fit.

    FInally one big difference is the level of source code control too many people don’t place sql under any kind of source control, finding new bugs is always interesting when you can’t get to an earlier version that worked.

  10. Iggy Fernandez says:

    Hi, Chen,

    I take fierce objection 🙂 to the claim that “Not using subquery factoring because it results in sub-optimal optimizer decision is one consequence.” A “common table expression” (CTE) is subject to “complex view merging” just like views and inline views in the body of the SQL statement; the optimizer should not do anything different with it. Therefore, CTEs should not degrade performance. My understanding is that they should help performance if they are used more than once in the body of the SQL statement.

    Consider the following query; it finds parts supplied by all suppliers.

    SELECT partname
    FROM part
    MINUS
    SELECT partname
    FROM (SELECT suppliername, partname
    FROM supplier, part
    MINUS
    SELECT suppliername, partname
    FROM quote);

    The above formulation can be factored as follows and I don’t see why the optimizer should treat the factored formulation differently than the unfactored formulation.

    WITH

    — Step 1: All possible Supplier x Part combinations
    SupplierPart AS
    (SELECT suppliername, partname
    FROM supplier, part),

    — Step 2: Valid Supplier x Part combinations
    ValidSupplierPart AS
    (SELECT suppliername, partname
    FROM quote),

    — Step 3: Invalid Supplier x Part combinations
    InvalidSupplierPart AS
    (SELECT suppliername, partname
    FROM SupplierPart
    MINUS
    SELECT Suppliername, Partname
    FROM ValidSupplierPart),

    — Step 4: Parts not supplied by at least one supplier
    UnwantedPart AS
    (SELECT partname
    FROM InvalidSupplierPart),

    — Step 5: Parts supplied by all suppliers
    WantedPart AS
    (SELECT partname
    FROM part
    MINUS
    SELECT partname
    FROM UnwantedPart)

    SELECT partname
    FROM WantedPart;

    I guess I should write test cases and produce 10053 traces to support my claims but I’m not as hard-working as Messrs Kyte and Lewis. I think I’ll go take a nap now 🙂 Perhaps when I wake, I should do a 10053 trace. I would really like to find out for sure.

    Regards,

    Iggy

    P.S. I’ve got some more “fierce objections” to your posting and will write some more notes after I arise from my nap 🙂

  11. prodlife says:

    @Iggy

    Thanks for the excellent reply!

    I understood that while there is no reason for the optimizer to work differently on factored subqueries, it does anyway…

    Jonathan Lewis wrote about it: http://jonathanlewis.wordpress.com/2007/07/26/subquery-factoring-2/

    I’m looking forward to more fierce objections!

  12. Iggy Fernandez says:

    Chen,

    I see that in this case M’sieur Lewis resorted to the “various people have reported examples” argument 🙂 So while it’s possible that the Oracle optimizer falls down different code paths when presented with a CTE instead of an equivalent view or inline view I have yet to see evidence of it.

    P.S. At the risk of digressing from the main point of your posting, I’d like to say that I think that Oracle’s approach of cost-based query transformation (CBQT) mentioned by Lewis is not the best approach to optimization. Queries should first be transformed into a canonical form. This would ensure that *all* semantically equivalent queries result in the same query plan. See this posting by Christopher Charles titled “Databases – My Brilliant Career Tuning SQL
    ” at http://christophercharles.blogspot.com/2003_07_20_archive.html. Also refer to this old paper titled “SQL REDUNDANCY AND DBMS PERFORMANCE” by Fabian Pascal available at http://www.dbdebunk.com/page/page/1317920.htm. CBQT is explained in a paper titled “Cost-Based Query Transformation in Oracle” by Rafi Ahmed et al; it is available at http://www.msci.memphis.edu/~linki/7116papers/Ahmed06CostBased.pdf.

    Regards,

    Iggy

    • Iggy,

      Sorry to be so late in responding to your comment regarding “various people”. It looks as if you stopped reading the link that Chen supplied just a couple of paragraphs too soon and therefore missed my example, viz:


      As a trivial example of the side-effects (or current restrictions, perhaps) of factored subqueries, it is worth noting that it’s only in the most recent release of 10.2 that you can get a star transformation to work if the code has been embedded in a “with” clause.

      Regards
      Jonathan Lewis

      • Iggy Fernandez says:

        Thanks, Jonathan; I stand corrected. It’s a shame that a CTE doesn’t work exactly the same way as an inline view. I like the closing line of your post: “So be just a little bit careful with the technique – it’s very useful, very clean and tidy, and a good way to control difficult queries: but it may reduce the number of cunning tricks the optimizer can use.”

  13. Iggy Fernandez says:

    “Fierce objection” #2:

    re: “I’m not sure if there are any guidelines on when to prefer SQL and when PL/SQL is better”

    Here’s what Tom Kyte says:

    “anyone that knows me, knows my mantra – written many times:

    You should do it in a single SQL statement if at all possible.
    If you cannot do it in a single SQL Statement, then do it in PL/SQL (as little PL/SQL as possible!).
    If you cannot do it in PL/SQL, try a Java Stored Procedure. The number of times this is necessary is extremely rare today with Oracle9i and above.
    If you cannot do it in Java, do it in a C external procedure. This is most frequently the approach when raw speed, or the use of a 3rd party API written in C is needed.
    If you cannot do it in a C external routine, you might want to seriously think about why it is you need to do it… ”

    http://tkyte.blogspot.com/2008/04/stuff.html

    Gray and Reuter say the same thing more formally:

    “The basic theorem of relational databases says that these operators are Godel complete: They are equivalent to first-order predicate calculus [Codd 1971]. But the relational calculus is not Turing complete: It cannot describe all computable functions. A Turing-complete computational model is needed to write general programs. There is no need to invent such a language; almost any programming language will do. Consequently, SQL is always combined with a programming language.”

    http://books.google.com/books?q=gray+reuter+%22turing+complete%22

    And here’s what Emerson says on the topic of quoting others 🙂

    “We are like children who repeat by rote the sentences of grandames and tutors, and, as they grow older, of the men of talents and character they chance to see,—painfully recollecting the exact words they spoke; afterwards, when they come into the point of view which those had who uttered these sayings, they understand them and are willing to let the words go; for at any time they can use words as good when occasion comes.”

    http://www.bartleby.com/5/104.html

  14. […] Rob van Wijk wrote a fine post about choosing between SQL and PL/SQL, defending his choice of straight-up SQL for logic. Naturally, this triggered a lot of discussion, as well as a few responses from other blogs. Chen Shapira framed her response in a question about code life-cycle: would you rather maintain SQL or PL/SQL? […]

  15. joel garry says:

    Iggy:

    And Chris Date says (in 1994):

    “The biggest misconception is that SQL is the same as the relational model. In fact, when people look at relational products, they see SQL products. Not unnaturally, they assume that SQL and the relational model are the same thing. They can’t really be blamed for that. But the fact is that they’re not the same thing. As a consequence, all the shortcomings of SQL are regarded as shortcomings of the relational model. The biggest shortcoming of SQL is that it doesn’t implement the relational model! Consequently, the relational model is, criticized for the very fact that the vendors haven’t implemented it. It drives me crazy. Precisely because of that big misconception, we get into what we’ve been talking about already, namely that relational technology can’t do object-oriented kinds of things. It could, if it was implemented right.” (And since you refer to Fabian, you probably know all about this).

    So really, there is a problem with making the canonical SQL: it is incorrectly assuming the SQL product itself has been written to do that.

    This does not conflict with what Tom Kyte says about doing it in SQL first if you can: Oracle of course knows its’ own SQL, and what it can or cannot do with it. (Aside from bugs and evolving features, of course – most of us have had to rewrite SQL with upgrades and/or bug fixes, some of us repeatedly).

    This also does not conflict with what Mr. Lewis finds, as he is merely (imagine some airquotes there!) stating reasoned research on the physical implementation.

    There is simply a vast gulf between what is and what ought.

    (Of course I was looking for something else about Codd and performance when I found that Date quote, but the “crazy” part made it irresistable to me).

  16. Savio Mody says:

    Incredible! This blog looks exactly like my old one!
    It’s on a completely different subject but it has pretty much the same page layout and design. Great choice of colors!


Leave a comment