Shrinking Tables Once More

Commenting on my post from last week, “an oracle dba” said:

“”shrink” would down the HWM of object, but may not for datafile. and it would invalidate the dependent plsql/trigger definitely.”

I did not believe that! Surely something as innocent as shrinking has no reason to invalidate the triggers. Maybe in 10g, but 11g has Fine Grained Dependencies, surely merely shrinking can’t invalidate a trigger in 11g!

So I tried it:

SQL> create table test2 (ID number, c_date date);

Table created.

SQL> create trigger trig1
  2  after insert on test
  3  referencing new as newrow
  4  for each row
  5  begin
  6  insert into test2 values (:newrow.id,sysdate);
  7  end trig1;
  8  /

Trigger created.

SQL> select object_name,object_type,status from user_objects where object_name='TRIG1';

OBJECT_NAME         OBJECT_TYPE          STATUS
-------------       -------------------  -------
TRIG1               TRIGGER             VALID

SQL> delete from test where mod(id,5)=0;

1000 rows deleted.

SQL> alter table test enable row movement ;

Table altered.

SQL>  alter table test shrink space cascade;

Table altered.

SQL> alter table test disable row movement ;

Table altered.

SQL> select object_name,object_type,status from user_objects where object_name='TRIG1';

OBJECT_NAME         OBJECT_TYPE          STATUS
-------------       -------------------  -------
TRIG1                TRIGGER             INVALID

Learn something new every day 🙂
Thanks “an oracle dba”!


Would You Rather Maintain SQL or PL/SQL?

Rob Van Wijk wrote about the reputation that SQL has as unmaintainable. He claims that SQL is not inherently less maintainable, but rather that developers who say that SQL is difficult to maintain are simply not as skilled in SQL as they are in PL/SQL, and they certainly don’t document their SQL to the same level they document their PL/SQL.

I mostly agree with Rob. It is not a coincidence that developers are more comfortable with PL/SQL than with SQL. PL/SQL is a procedural language. Nearly everyone learned programming with a procedural language, and probably programmed in a procedural language for at least three years before being introduced to SQL.

SQL is a different beast. It is a data maniplation language based (roughly) on the relational model, and most of its operators are based in relational algebra. Quite different from Pascal, C or Java. Its not really surprising that most programmers are never entirely comfortable around SQL and keep being surprised by its behavior.

I think there are few other reasons that SQL is percieved as less maintainable though:

  1. Readability of SQL is heavily dependent on the underlying data model. A good model will lead to more readable and maintainable SQL. When the data model does not match the reports that need to be generated, PL/SQL may have an advantage, its readability being less impacted by the data model.
  2. Differences in how SQL statements are written can lead to differences in performance. It is somewhat true in PL/SQL as well, but I think SQL is more sensitive. Performance often takes priority over readable code. Not using subquery factoring because it results in sub-optimal optimizer decision is one consequence. Of course, the reasons for writing SQL in a specific way are very rarely documented, making the code even less maintainable.
  3. As a result of #2 – developers who will gladly refactor PL/SQL code to make it more readable, will think at least twice before refactoring SQL. It is nearly trivial to predict how a given code change will impact PL/SQL performance, but nearly impossible to do the same for SQL (unless you are Jonathan Lewis).
  4. Debugging is a big issue with SQL. In PL/SQL it is easy to add few put_line messages to track what exactly your code is doing and why. There are even nice commercial debuggers. Finding bugs in SQL is still a bit of a black art that must be done without any tools or methods.

I’m not sure if there are any guidelines on when to prefer SQL and when PL/SQL is better, but at least the discussion is a bit more complete.


Read Only User

“Oh mighty DBA” said the developer, “You are wise and very attractive. Can I please have access to the TEST system of my application? I need to debug an issue that QA found and I cannot reproduce”.

(I’m paraphrasing a bit).

“No way. You will change few things while you are there and ruin the test cycle. Even if you won’t, QA will think you did and this will ruin the test cycle.”

“I see your point, wise and attractive DBA. In this case, can I have just read-only access to the system?”

(I hate this part in the conversation. When a request sounds innocent and trivial, but is in-fact complicated and full of trickery.)

“I’ll see what I can do. But you don’t get to execute any functions. Just select on table and views. And you’ll need to prefix the table names with the name of the original user.”

(Well, actually giving synonyms is easy enough. If he’ll insist, I’ll add them).

I wrote the function that does it. It is horrible in several ways:

  1. It uses “execute immediate”.
  2. No bind variables. I don’t think I can use them in this case, but it still means I’m doing bad things to the library cache.
  3. I add “with grant” privilege to all object privileges the original user had. I did not want to touch the original user, but as I explained before, I had no choice. I could have been more selective here, though.

I hope it will be useful, but I post it mostly as a rough reference. Think twice before you use it.

CREATE OR REPLACE
PROCEDURE GRANT_READ_ONLY(RO_USER in varchar2, ORIG_USER in varchar2,with_grant in BOOLEAN 
default true) AUTHID current_user AS stmt varchar2(4000);
get_grant_objs varchar2(4000);
TYPE cur_typ IS REF CURSOR;
c_cursor cur_typ;
Type user_priv_typ is record (
    table_name all_tab_privs.table_name%type,
    privilege all_tab_privs.privilege%type,
    grantor all_tab_privs.grantor%type);
user_priv_rec user_priv_typ;

BEGIN
  if (with_grant) then
      get_grant_objs := 'select table_name,privilege,grantor from dba_TAB_PRIVS dtp
          join dba_objects do on dtp.grantor=do.owner and dtp.table_name=do.object_name
          where grantee=''' || ORIG_USER || '''  order by TABLE_NAME';
      open c_cursor for get_grant_objs;
      LOOP
        FETCH c_cursor into user_priv_rec;
        EXIT WHEN c_cursor%NOTFOUND;
        stmt := 'GRANT ' || user_priv_rec.privilege || ' ON ' || user_priv_rec.grantor || '.' || 
        user_priv_rec.table_name || ' TO ' || ORIG_USER || ' WITH GRANT OPTION'; 
        execute immediate stmt;
        --dbms_output.put_line(stmt);
  END LOOP;
  CLOSE c_cursor;
  end if;
  for obj in 
  (
  select object_name from all_objects
  where owner=ORIG_USER
  and object_type in ('SEQUENCE','TABLE','VIEW','MATERIALIZED VIEW'))
  LOOP
    stmt := 'GRANT SELECT ON ' || ORIG_USER || '.' || obj.object_name || ' TO ' || RO_USER;
    execute immediate stmt;
    --dbms_output.put_line(stmt);
  END LOOP;
END;

Just Return Any Random Row

Sometime I see a developer try to run something like “select deptno,sal from emp group by deptno”. I usually ask the developer “But employees in the department have many different salaries. Which one do you want?”, and sometimes I get an amazing answer: “I don’t really care, I just want to see any random salary”. Usually, I tell her to aggregate by “max” or something similar, if she doesn’t care about the result.

But today I was really annoyed. So I wrote a custom aggregation function that will return a random salary.  I admit, it is not as useful as str_agg, but there seem to be few developers who are interested in this feature.

The main challenge was to make the aggregation truly random. When you aggregate, you always have the current aggregation value and a new one. I have to randomly choose one of them – but I can’t make it a simple 50/50 selection.

Suppose I have three rows. The way aggregation works, I first take two rows and flip a coin to pick one. Now I have a current value – and I have to take the third row and decide if I want to keep the current value or the new one. I can’t flip the coin again – because if the third row has 50% chance to be selected, this means the first and second rows only have 25% chance each. Not fair. So I need to give the third row 1/3 chance, and the current value 2/3.

create or replace type agg_t as object (

    curr_value number,
    running_count number,

    static function ODCIAggregateInitialize(sctx  in out agg_t)
                    return number,

    member function ODCIAggregateIterate   (self  in out agg_t,
                                            new_value in number)
                    return number,

    member function ODCIAggregateTerminate (self         in     agg_t   ,
                                            return_value    out number,
                                            flags        in number      )
                    return number,

    member function ODCIAggregateMerge(self in out agg_t,
                                       ctx2 in agg_t    )
                    return number
);
/

create or replace type body agg_t is 

    static function ODCIAggregateInitialize(sctx in out agg_t)
        return number is
    begin
        sctx := agg_t(null,0);
        return ODCIConst.Success;
    end;

    member function ODCIAggregateIterate(
      self in out agg_t, new_value in number)
        return number is
    begin
        running_count:=running_count+1;
        if (mod(dbms_random.random,running_count)=0) then
        	curr_value:=new_value;
        end if;
        return ODCIConst.Success;
    end;

    member function ODCIAggregateTerminate(self in agg_t,
        return_value out number, flags in number) return number is
    begin
        return_value := curr_value;
        return ODCIConst.Success;
    end;

    member function ODCIAggregateMerge(self in out agg_t,
        ctx2 in agg_t) return number is
    begin

    	if (mod(dbms_random.random,running_count+ctx2.running_count)<ctx2.running_count) then
    		curr_value:=ctx2.curr_value;
    	end if;
        running_count:=running_count+ctx2.running_count;
        return ODCIConst.Success;
    end;
end;
/

create or replace function agg_random (input number) return number
    parallel_enable aggregate using agg_t;
/
&#91;/sourcecode&#93;

So easy! I love user defined aggregations! But you have to be careful when writing them. I accidentally replaced "number" with "varchar" somewhere in the code and got a lovely error message when I tried to run it:
<pre>SQL&gt; select deptno,agg_random(sal) from emp group by deptno;
select deptno,agg_random(sal) from emp group by deptno
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 2104
Session ID: 138 Serial number: 1496</pre>
And in the alert log: <em>Exception [type: ACCESS_VIOLATION, UNABLE_TO_WRITE] [ADDR:0x0] [PC:0x3D31168, ___intel_new_memcpy()+40]</em>
I segfaulted! and you can imagine how fun it was to debug my pl/sql code using the trace file...

Anyway, after all the debugging is done, its time to show the code to my developer:


SQL> select deptno,agg_random(sal) from emp group by deptno;

    DEPTNO AGG_RANDOM(SAL)
---------- ---------------
        10            2450
        20            2975
        30             950

SQL> select deptno,agg_random(sal) from emp group by deptno;

    DEPTNO AGG_RANDOM(SAL)
---------- ---------------
        10            1300
        20            2975
        30            1600

SQL> select deptno,agg_random(sal) from emp group by deptno;

    DEPTNO AGG_RANDOM(SAL)
---------- ---------------
        10            2450
        20            2975
        30             950

SQL> select deptno,agg_random(sal) from emp group by deptno;

    DEPTNO AGG_RANDOM(SAL)
---------- ---------------
        10            2450
        20            3000
        30            2850

But I still wasn’t happy. What if I have a bug and the selection is not random enough? After all, the values in the example seems a bit repetitive.

No problem. I’ll use my function on a table containing numbers 1 to 100, pick few random numbers, and then use the well known chi-square test to check if my random selection matches the uniform distribution.

SQL> create table r1 as select level l from dual connect by level<100;

Table created.

SQL> create table r2 (n number);

Table created.

SQL> insert into r2 select agg_random(l) from r1 group by ();

1 row created.

-- repeated 20 times

SQL> select count(*) from r2;

  COUNT(*)
----------
        20
        
        
SQL> DECLARE
  2   sig   NUMBER;
  3   alpha NUMBER := 1;
  4   beta  NUMBER := 99;
  5   ttype VARCHAR2(20) := 'CHI_SQUARED';
  6  BEGIN
  7    dbms_stat_funcs.uniform_dist_fit('SCOTT', 'R2', 'N', 'DISCRETE', ttype, a
lpha, beta, sig);
  8
  9    dbms_output.put_line(sig);
 10  END;
 11  /
X-squared value : .9000000000000000000000000000000000000026 
Degree of freedom : 18.0000000013919621366679664

PL/SQL procedure successfully completed.

90% probability of getting my result assuming the function is good. I’m so happy. I knew that getting a degree in statistics was not a complete waste of time 🙂


Looking for great PL/SQL

Steve Yegge is a developer who writes way too much. Each one of his blog posts can easily take an hour to read, but they tend to contain nice gems that make the long read worth the time.

Today I spent an hour reading about 50% of a fairly interesting article about how to practice programming. His main point is that just programming on the job will not make you a better programmer. You have to actively strive to improve. (I just saved you an hour of reading!). One of his suggestions on how to become a better programmer is this:

Read through someone else’s code for 20 minutes. For this drill, alternate between reading great code and reading bad code; they’re both instructive. If you’re not sure of the difference, ask a programmer you respect to show you examples of each. Show the code you read to someone else, and see what they think of it.

Sounds like a good advice. Now I only have to find great and terrible code examples. Well, if I want to practice C, this is not an issue – for great code examples I read Linux Kernel code, and for awful examples I read code that I wrote 7 years ago (and it looks worse every year).

But for PL/SQL , I’m a bit stuck. I can still read my own code for bad examples, but where can I find examples for great code? My colleagues write nice enough code, but we have our own coding style and standards, and if something is wrong with these, it’ll be difficult to tell without outside input.  Somehow, there is simply no open-source code written in PL/SQL that I can read to get a good idea of how PL/SQL should be written.

The closest I’ve found is the code examples in Steven Feuerstein’s PL/SQL book. These have been extremely helpfull in teaching me how to handle exceptions, iterate through cursors, work with associative arrays, etc. But these are just examples, I can’t learn how to cleanly build a fairly large working module from these great examples. Its nothing like reading through the Linux kernel.

I think the issue is somewhat related to the problem Tyler Muth raised about lack of reusable PL/SQL modules. Oracle has an amazing community, but for some reason we don’t share our code and we don’t participate in open-source community projects. Most of our community efforts revolve around helping each other learn new features and solve performance issues.

We are probably missing an opportunity to help each other resolve performance problems before they occur.


From C to PL/SQL

The most dangerous assumptions are those that you are not aware of. We all know we should question assumptions and verify everything, but how can you question the assumptions you don’t even know you have?

C developers who move to PL/SQL development should constantly be aware that they are not in AT&T labs anymore. Of course, constant awareness is easier said than done, and Oracle keeps surprising.

It keeps surprising even after we learned that in Oracle, integers are not really integers, and are not faster than decimals. Unless they are binary_integers. Binary_integers sounds nice and comfortable, C developers love binary – after all C is just portable assembly. Don’t get your hopes up though – you won’t find binary shift operators in the language. If you ask your co-workers about the missing binary operations, you are in for a lecture. Database people have their own optimization methodologies, and they only worry about bits and bytes when they count IO operations.

How to recognize a C developer who now dabbles in PL/SQL? Ask him how many times will the following code print “… running very slow calculation …”:

create function expr return number
is
begin
dbms_output.putline('... running very slow calculation ...');
return 10;
end;

begin
for i in 1..expr() loop
insert into t values (i);
end loop;
end;

A C developer will say “ten times”, and will insist on “optimizing” the loop and write:

declare
e number;
begin
e:=expr();
for i in 1..e loop
insert into t values (i);
end loop;
end;

PL/SQL developers will know that loop boundaries are calculated once, when you first enter the loop. Therefore the line will be printed exactly once, and the second version is not faster than the first.

As they said in Pulp Fiction:
“It’s the little differences. I mean they got the same shit over there that they got here, but it’s just – it’s just there it’s a little different.”

***
Another identifying trait of former C developers is that they are forever interested in low level internals, and particularly memory management.
Lucky for me, Christo Kutrovsky published a paper about this exact topic. It is the perfect paper – crystal clear, gets into all the interesting and important details, no attempts to dumb down anything, and it even has practical conclusions.


Exception Safety for PL/SQL

One of the things I like about C++, is that there are tons of theory around the language. There are many smart people who sit around and write articles about how different language mechanisms are best used. We have a bit of it in PL/SQL, but not nearly enough.

One of the most important articles about exception safety in C++ is Abraham’s Exception Safety in Generic Components. 90% of it is totally irrelevant to PL/SQL developers because the smart people from Oracle chose not to inflict templates and generic components on us (at least yet, who knows what will happen in the next version). However, in this article Abrahams defines the different levels of exception safety that a code component can achieve, and this directly relates to PL/SQL developments.

The levels are described in terms of a PL/SQL block that is called by another, and what can the caller expect after the called block has thrown an exception when each level is implemented.

Exception safety levels, as I translate them from C++ to PL/SQL:

  • No safety: You don’t handle exceptions. When your code breaks you have no idea what is the state of your structures and data, and all kind of resources like cursors may be left open. Note that you can achieve this state by completely ignoring the possibility of errors, or by using “when others null”. Be glad that you are developing in PL/SQL and not in C++, because at least Oracle keeps some semblance of data consistency, and keeps you from memory leaks and crashes.
  • Basic Safety: You take care to close cursors or other resources when exception occurs, and that you leave the system in a valid state. Raise_application_error functions does a bit of this – cleanup of out variables if relevant. Note that you still have no idea what the block actually did – inserting one row out of three is a valid state, so are unknown changes to variables in the calling block. In fact, in PL/SQL there is very little difference between basic safety and no safety – because Oracle works hard to make sure that truly invalid states don’t occur even if you do no exception handling at all. But they can happen if you have integrity rules that you expect your code to keep and are not modeled into the database (foreign keys and similar constraints are a good example for this – if you didn’t model them into the tables, you will need to take care of them as part of basic exception safety).
  • Strong Safety: This what we usually aim for in exception handling – complete success or full rollback. Either the block did what it was supposed to do, or it left the system as if it was never called, leaving the caller with all the options open when deciding how to proceed. It is difficult to achieve this state in C++, but in PL/SQL it should be almost trivial – scope your transaction correctly, so Oracle’s built in rollback can take of most of the cleanup for you, release all opened resources, clean up changes to global variables (this can be tricky – so avoid using global variables), and you should be good.
  • No-throw Safety: The operation will always succeed and never throw an exception. Usually both impossible and unnecessary. Except maybe for code you use in exception handling and similar cleanup operations.

Knowing all this may not cause you to write safer code, but hopefully you can at least think about how unsafe is the code you write, and what it will mean to those who use it.

***

If you are looking for something nice to read on the weekend, here are my recommendations:

Charles Schultz has a DB with high hopes for Christmas.
Jon Emmons explains about holiday decorations.
Dizwell released Doris, the queen of Oracle-on-Linux installations.


Dissociated Press

There is this nice Emacs game where you take some random text and use random walk to generate a somewhat similar but completely random text out of it. It is called “Dissociated Press“.

Out of sheer randomness, I’ve implemented this in Oracle.

Notes:

  • I’m using varchar to hold the text, which limit us to short and therefore uninteresting texts. CLOB would be better.
  • If there is any serious amount of text involved, you’ll want to create a non-unique index on the first two columns of the markov table after you populate the table, and gather statistics on that.
  • Feel free to criticize my code. That is why I post it. But note that I’ve broken some lines so it will look nice in the blog.
  • Also feel free to rewrite the entire thing with a single model statement.
  • The algorithm should be pretty straightforward, but you can read about a similar implementation here.
CREATE TABLE markov
       (c1 VARCHAR(30),
       c2 VARCHAR(30),
       c3 VARCHAR(30));

DECLARE
        cnt BINARY_INTEGER;
        i BINARY_INTEGER;
        my_str VARCHAR(4000) := 'A bug that has been documented.
To call something a feature sometimes means the author of the
program did not consider the particular case, and that the program
 responded in a way that was unexpected but not strictly incorrect.
 A standard joke is that a bug can be turned into a feature simply
 by documenting it (then theoretically no one can complain about
it because its in the manual), or even by simply declaring it to
be good. Thats not a bug, thats a feature is a common
catchphrase.';
        my_comma_str VARCHAR(4000);
        my_table dbms_utility.uncl_array;
        num_itr INTEGER := 50;
        tmp_c1 VARCHAR(30);
        tmp_c2 VARCHAR(30);
        tmp_c3 VARCHAR(30);
BEGIN
     select REPLACE('"' ||
	REPLACE(my_str,' ','","') || '"',',"",',',')
	INTO my_comma_str FROM dual;
     dbms_utility.comma_to_table(my_comma_str, cnt, my_table);
     FOR i in 3..cnt LOOP
         INSERT INTO markov (c1,c2,c3) VALUES
	(replace(my_table(i-2),'"',''),
	replace(my_table(i-1),'"',''),replace(my_table(i),'"',''));
     END LOOP;
     COMMIT;
     SELECT c1,c2,c3 INTO tmp_c1,tmp_c2,tmp_c3 FROM markov
     WHERE ROWNUM=1;
     DBMS_OUTPUT.put_line(tmp_c1 || ' ' || tmp_c2 || ' ');
     FOR i IN 1..num_itr loop

     	 SELECT c1,c2,c3  INTO tmp_c1,tmp_c2,tmp_c3 from
		  (SELECT * FROM markov WHERE c1=tmp_c2 AND
		  c2=tmp_c3 ORDER BY dbms_random.VALUE) t
		  WHERE ROWNUM=1;
		  DBMS_OUTPUT.put_line(tmp_c2 || ' ');
	 END LOOP;

END;

**
This post has been partially inspired by this one.


Small shortcut in a large report

Came back from Oracle OpenWorld to a flurry of business demands. End of Quarter is here and with it new demands for reports and data.

The main report that I’m writing is rather large, generating the data requires about 30 different queries on about 10 databases. Some run in few seconds while others, despite my best tuning efforts can take an hour or more.

So, I wrote a PL/SQL procedure that runs all the queries on all the DBs and collects the results into a single table that I’ll use in the report. That’s pretty cool, but suppose there is an error in query number 23? After I fix the error, I have to run the report again which takes hours. Quite annoying.

So, I created a small table:

CREATE TABLE PROGRESS
( "STAT_NAME" VARCHAR2(30),
"SCHEMA_NAME" VARCHAR2(100),
"TIME_DONE" DATE)

And now, before each query runs, I check in the progress table if it was already done or not. If it was already done, I don’t need to do it again. If it wasn’t, I run it and after it ran successfully, I add it to the table.

Here’s the procedure that takes care of this process:

create or replace
PROCEDURE CONDITIONAL_EXEC ( in_stat_name IN VARCHAR2,
in_schema_name IN VARCHAR2,sql_stmt in varchar2) AS

stat_status int;
BEGIN
select count(*) into stat_status from progress where schema_name=in_schema_name and stat_name=in_stat_name;
if stat_status=0 then
dbms_output.put_line ('About to run: ' || sql_stmt);
execute immediate sql_stmt;
insert into progress values (in_stat_name,in_schema_name,sysdate);
commit;
end if;
END CONDITIONAL_EXEC;

Advantages of this approach:

  1. I save time by not running the same stuff twice
  2. I can always “undo” something that already ran by deleting from that table
  3. I can watch the progress of my report in the table
  4. The table contains the finish time of each query. This means I have rough information on how long each query took to run and I can use this information to decide which queries need more work.
  5. I can also use the time information to show the business how much db time each statistic they request costs them.Maybe they’ll decide that some statistics are not that important? Especially those that are carefully crafted so we can’t use any existing aggregated data to produce them. Unlikely, but who knows?

Not bad for 5 minutes of work.


Perl or PL/SQL?

It happens every three month – the project managers need to generate their quarterly report. Every three month they decide that the reports they used until now are not good enough and they come up with a new design for the report. The DBAs now need to get the data, preferably yesterday. After we gather the data and create the report, there is another week during which we try to convince them that the results are correct and they try to convince us to massage the data a bit to reflect the results they expected.

This time, to generate the report I need to collect data from six different databases which  together contain about 30 schema, one of them is SQL Server. Sounds like fun already.

Naturally, I want to write a script that will run all the necessary queries, do all the necessary calculations and aggregations, and dump everything into a table. Then I’ll grab the results from that table and put them into a nice looking Excel (with graphs and colors!).

My only dilemma is: Perl or PL/SQL?

Why Perl?

  1. I know Perl much better than PL/SQL, so I’ll get it written fast and more correctly.
  2. Perl has no problem connecting to SQL Server – I’m kind of worried it may be an issue with PL/SQL.
  3. If I do the report in Perl, I won’t have to create all those linked servers to all the other systems. Our security team dislikes linked servers.

Why PL/SQL?

  1. Well, learning PL/SQL was one of my objectives this year – so this is a great opportunity.
  2. It will probably run much faster. I can’t be sure about it at this stage, but this is usually true.
  3. Other DBAs can maintain PL/SQL code. I’m the only one who does Perl.

I think that if I’ll find a way to get the SQL Server data from PL/SQL, I’ll go the PL/SQL route. I think its called “heterogeneous server”, but last time I checked it was quite complicated to use. Maybe it is worth a second chance.