I'm just a simple DBA on a complex production system

Writing about all things production. Especially Oracle databases.

Streams and Grants July 30, 2008

Filed under: streams,tips — prodlife @ 2:24 am

I didn’t blog about streams in a while. Since the last time I blogged about streams our configuration changed a lot, but I’ll write about it another time. Its more fun to write about war stories than about nice clean architectures with ultra easy setup scripts.

Yesterday another happy customer went live on our reporting database. After the ultra easy setup, everything seemed to work fine. But today between 11am to 3pm, we accumulated over 20 errors in dba_apply_errors table. All of them were “ORA-00942: table or view does not exist”. All of them for “create view” and “create table” DDL.

Important streams fact of the day: Schema level replication does not replicate object grants given to that schema or on that schema. Even if you replicate DDL.

To resolve the issue we had to give the grants manually on the reporting database, and then use dbms_apply_adm.execute_all_errors to rerun all the DDLs that failed. It was very neat to see that once the grants were given, execute_all_errors worked perfectly and recreated all the views we missed.

If we anticipated more privileges changes on the replicated schemas, we would have had to find a permenant solution for their replication. Perhaps adding a global DDL rule with a filter to replicate grants only for these specific schemas. Lucky for me, privilege changes are not expected to happen in the future.

 

How Many Parameters can Rank() Take? July 24, 2008

Filed under: nerdism — prodlife @ 2:26 am

Laurent Schneider and I were discussing the analytics chapter of his nearly finished book, when he casualy mentioned that Rank() can take unlimited number of parameters. “Wow!” I said, “Thats cool. But how many parameters can it take before something crashes?”

I checked on my test 11g system, running on a puny windows xp.

Starting with 80 parameters, everything works fine:

SQL> select
2    rank(
3  1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8
,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8
,9,0
4    ) within group (order by
5  1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8
,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8
,9,0
6  ) x
7  from dual
8  /

X
———-
1

160 parameters also worked fine, 240 worked fine, returns results within two seconds.

So I tried with 320.  It has been running for almost 15 minutes when I decided to check with 310 and 300. At that point Oracle took 100% cpu and my mouse could barely move. I had to crash Oracle. You could say that three processes each with slightly over 300 parameters crashes Oracle :)

I continued donating my CPU to scientific discovery and reached interesting results:

At 255 parameters Rank still works fine. At 256, Oracle seems to go into an infinite loop – it never returns an answer and CPU climbs to 100%. Suspicious numbers, I’d say.

 

Educational Seduction July 16, 2008

Filed under: musing — prodlife @ 2:42 am

I want to talk about an article published at Medical Education magazine back in 1973, 4 years before Oracle was founded. Obviously this article has nothing to do with Oracle. It is still fascinating.

A group of academics had an hypothesis that student satisfaction with their teachers depends more on the teacher personality and charisma than on the content of the lecture.

To test this hypothesis, they hired a distinguished and authoritative looking actor, gave him a nice title and an impressive fictitious CV, and arranged for him to present in an academic conference in front of highly trained professional educators.  The researchers instructed the actor to present his topic and conduct his question and answer period with an excessive use of double talk, neologisms, non sequiturs, and contradictory statements. All this was to be interspersed with parenthetical humor and meaningless references to unrelated topics.

After the lecture was over, the researches asked the audience to fill feedback forms for the session. Similar to the kind you see in Oracle Education courses. This entire experiment was repeated three times with different audiences.

The feedback was terrific. The audience was unanimous that the lecture “simulated their thinking” and was “well organized”. Comments included: “Excellent presentation, enjoyed listening.” “What about the two types of games, zero-sum and non-zero sum?” “Too intellectual a presentation. My orientation is more pragmatic” “Extremely articulate.” “Interesting, wish he dwelled more on background.” “Good analysis of subject that has been personally studied before.”

Fascinating. Given a sufficiently charismatic presenter, an audience of highly trained professionals completely failed to detect that they were fed bullshit and that the lecture was content free.

The results of the experiment were probably influenced by the fact that the given field (education) is not a hard science, which makes it harder to detect bullshit. Life is easier for us, Oracle users. We just ask for test cases.

 

Troubleshooting Broken Clusterware July 10, 2008

Filed under: rac,tips — prodlife @ 3:06 am

I spent most of the day figuring out why one node on a 2-node RAC crashed during the night.

Here are the steps I found useful when debugging cluster issues:

  1. Check DB alert log on all nodes
  2. Check clusterware logs on all nodes. There are found in $CRS_HOME/log. The useful ones are the alert log, crsd log and cssd log.
  3. Check write permissions to voting disk. From all nodes. As Oracle and as root.
  4. Check the network interfaces. Both by looking at ifconfig on all nodes, pinging every node from every other node using all its names and interfaces (public, private, vip).
  5. Verify SSH the same way.
  6. Check that both nodes run the same OS version and same DB and clusterware versions (including patches).
  7. Stop and start clusterware on each node seperately and then on both nodes together.
  8. Reboot both nodes.

In my case, the interconnect disappeared during the night. Simply no ping on the internal interface. Maybe someone stole our network cards?

The symptoms  were

  1. After each restart, the first node up worked fine, while the second one failed to connect to cluster.
  2. In crsd log, it looked like this:
    2008-07-09 16:11:17.539: [ CSSCLNT][2541575744]clsssInitNative: connect failed, rc 9
    2008-07-09 16:11:17.539: [  CRSRTI][2541575744]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..
  3. In ocssd log:
    [    CSSD]2008-07-09 19:43:41.769 [1220598112] >WARNING: clssnmLocalJoinEvent: takeover aborted due to ALIVE node on Disk
    [    CSSD]2008-07-09 19:43:42.753 [1115699552] >TRACE:   clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(14465) LATS(11886024) Disk lastSeqNo(14465)
    [    CSSD]2008-07-09 19:43:43.755 [1115699552] >TRACE:   clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(14466) LATS(11887024) Disk lastSeqNo(14466)
    [    CSSD]2008-07-09 19:43:44.758 [1115699552] >TRACE:   clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(14467) LATS(11888034) Disk lastSeqNo(14467)

Just in case you run into the same issue :)

 

OpenWorld 2008 – Did you vote? July 8, 2008

Filed under: openworld08 — prodlife @ 1:36 am

In a classic better-late-than-never move, I noticed that Oracle Mix site allows you to vote on Open World sessions. I should have noticed it before (say, in time to submit my own ideas!), because people mentioned it to me, but this just slipped through.

At least it was not too late to vote for interesting sessions. I was amazed by the number of sessions proposed. Lots of the sessions sound sexy and relevant. The best thing is that when looking through the sessions that appealed to me, I noticed that I don’t recognize many of the names. This is terrific, because as much as I enjoy hearing the usual gurus, sometimes I feel that after three conferences things begin to repeat themselves with same speakers and same topics all the time. I assume that this is because conference organizers prefer to stay on the safe side with well known gurus, but I really welcome the opportunity to hear some new voices in the Oracle community.

So if you didn’t vote yet, I recommend looking through the proposal lists. I bet you’ll find a few that you’ll want to vote for.

 

Just Return Any Random Row July 1, 2008

Filed under: development,nerdism,plsql — prodlife @ 2:13 am

Sometime I see a developer try to run something like “select deptno,sal from emp group by deptno”. I usually ask the developer “But employees in the department have many different salaries. Which one do you want?”, and sometimes I get an amazing answer: “I don’t really care, I just want to see any random salary”. Usually, I tell her to aggregate by “max” or something similar, if she doesn’t care about the result.

But today I was really annoyed. So I wrote a custom aggregation function that will return a random salary.  I admit, it is not as useful as str_agg, but there seem to be few developers who are interested in this feature.

The main challenge was to make the aggregation truly random. When you aggregate, you always have the current aggregation value and a new one. I have to randomly choose one of them – but I can’t make it a simple 50/50 selection.

Suppose I have three rows. The way aggregation works, I first take two rows and flip a coin to pick one. Now I have a current value – and I have to take the third row and decide if I want to keep the current value or the new one. I can’t flip the coin again – because if the third row has 50% chance to be selected, this means the first and second rows only have 25% chance each. Not fair. So I need to give the third row 1/3 chance, and the current value 2/3.

create or replace type agg_t as object (

    curr_value number,
    running_count number,

    static function ODCIAggregateInitialize(sctx  in out agg_t)
                    return number,

    member function ODCIAggregateIterate   (self  in out agg_t,
                                            new_value in number)
                    return number,

    member function ODCIAggregateTerminate (self         in     agg_t   ,
                                            return_value    out number,
                                            flags        in number      )
                    return number,

    member function ODCIAggregateMerge(self in out agg_t,
                                       ctx2 in agg_t    )
                    return number
);
/

create or replace type body agg_t is 

    static function ODCIAggregateInitialize(sctx in out agg_t)
        return number is
    begin
        sctx := agg_t(null,0);
        return ODCIConst.Success;
    end;

    member function ODCIAggregateIterate(
      self in out agg_t, new_value in number)
        return number is
    begin
        running_count:=running_count+1;
        if (mod(dbms_random.random,running_count)=0) then
        	curr_value:=new_value;
        end if;
        return ODCIConst.Success;
    end;

    member function ODCIAggregateTerminate(self in agg_t,
        return_value out number, flags in number) return number is
    begin
        return_value := curr_value;
        return ODCIConst.Success;
    end;

    member function ODCIAggregateMerge(self in out agg_t,
        ctx2 in agg_t) return number is
    begin

    	if (mod(dbms_random.random,running_count+ctx2.running_count)<ctx2.running_count) then
    		curr_value:=ctx2.curr_value;
    	end if;
        running_count:=running_count+ctx2.running_count;
        return ODCIConst.Success;
    end;
end;
/

create or replace function agg_random (input number) return number
    parallel_enable aggregate using agg_t;
/

So easy! I love user defined aggregations! But you have to be careful when writing them. I accidentally replaced “number” with “varchar” somewhere in the code and got a lovely error message when I tried to run it:

SQL> select deptno,agg_random(sal) from emp group by deptno;
select deptno,agg_random(sal) from emp group by deptno
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 2104
Session ID: 138 Serial number: 1496

And in the alert log: Exception [type: ACCESS_VIOLATION, UNABLE_TO_WRITE] [ADDR:0x0] [PC:0x3D31168, ___intel_new_memcpy()+40]
I segfaulted! and you can imagine how fun it was to debug my pl/sql code using the trace file…

Anyway, after all the debugging is done, its time to show the code to my developer:

SQL> select deptno,agg_random(sal) from emp group by deptno;

    DEPTNO AGG_RANDOM(SAL)
---------- ---------------
        10            2450
        20            2975
        30             950

SQL> select deptno,agg_random(sal) from emp group by deptno;

    DEPTNO AGG_RANDOM(SAL)
---------- ---------------
        10            1300
        20            2975
        30            1600

SQL> select deptno,agg_random(sal) from emp group by deptno;

    DEPTNO AGG_RANDOM(SAL)
---------- ---------------
        10            2450
        20            2975
        30             950

SQL> select deptno,agg_random(sal) from emp group by deptno;

    DEPTNO AGG_RANDOM(SAL)
---------- ---------------
        10            2450
        20            3000
        30            2850

But I still wasn’t happy. What if I have a bug and the selection is not random enough? After all, the values in the example seems a bit repetitive.

No problem. I’ll use my function on a table containing numbers 1 to 100, pick few random numbers, and then use the well known chi-square test to check if my random selection matches the uniform distribution.

SQL> create table r1 as select level l from dual connect by level<100;

Table created.

SQL> create table r2 (n number);

Table created.

SQL> insert into r2 select agg_random(l) from r1 group by ();

1 row created.

-- repeated 20 times

SQL> select count(*) from r2;

  COUNT(*)
----------
        20
        
        
SQL> DECLARE
  2   sig   NUMBER;
  3   alpha NUMBER := 1;
  4   beta  NUMBER := 99;
  5   ttype VARCHAR2(20) := 'CHI_SQUARED';
  6  BEGIN
  7    dbms_stat_funcs.uniform_dist_fit('SCOTT', 'R2', 'N', 'DISCRETE', ttype, a
lpha, beta, sig);
  8
  9    dbms_output.put_line(sig);
 10  END;
 11  /
X-squared value : .9000000000000000000000000000000000000026 
Degree of freedom : 18.0000000013919621366679664

PL/SQL procedure successfully completed.

90% probability of getting my result assuming the function is good. I’m so happy. I knew that getting a degree in statistics was not a complete waste of time :)

 

 
Follow

Get every new post delivered to your Inbox.

Join 48 other followers