Joining Unrelated Data

I’m sure this happened to every DBA – an important customer sends an Excel spreadsheet and explains that he wants this report generated from the DB on a weekly basis.

At first look the report looks easy enough and you say “Sure, you’ll have it ready by the end of the week”, but on second look something seems off and you give the customer a call “Listen, it seems that the data in the columns is not really related. First column contains your customers and the second contains the contact emails for each customer, which is fine. But the third column contains the open issues for the customer , and it is not really related to the contact emails column.”
And the customer calmly replies “You are right, it isn’t related. Thats how I want the report. I’ll have it ready by Friday, yes?”

No choice but to write a non-relational join. I do this by generating a row id column and joining on that.
For example:

SQL> create table customers (cust_id number, cust_name varchar(20));

Table created.

SQL> insert into customers values (1,'cust A');

1 row created.

SQL> insert into customers values (2,'cust B');

1 row created.

SQL> insert into customers values (3,'cust C');

1 row created.

SQL> create table contacts (cust_id number, email varchar(64));

Table created.

SQL> insert into contacts values (1,'a1@a.com');

1 row created.

SQL> insert into contacts values (1,'a2@a.com');

1 row created.

SQL> insert into contacts values (2,'b1@b.com');

1 row created.

SQL> insert into contacts values (3,'c1@c.com');

1 row created.

SQL> insert into contacts values (3,'c2@c.com');

1 row created.

SQL> insert into contacts values (3,'c3@c.com');

1 row created.

SQL> create table issues (cust_id number, description varchar(4000));

Table created.

SQL> insert into issues values (1,'Late delivery');

1 row created.

SQL> insert into issues values (1,'Missing components');

1 row created.

SQL> insert into issues values (2,'Wrong address');

1 row created.

SQL> insert into issues values (3,'Malfunctions');

1 row created.

SQL> col email format a10
SQL> col description format a20
SQL> select nvl(t1.cust_name,t2.cust_name),email,description from
  2  (select dense_rank() over (partition by cust_name order by email) n, cust_n
ame,email
  3  from customers cr join contacts cn on cr.cust_id=cn.cust_id) t1
  4  full outer join
  5  (select dense_rank() over (partition by cust_name order by description) n,
cust_name,description
  6  from customers cr join issues i on cr.cust_id=i.cust_id) t2
  7  on t1.cust_name=t2.cust_name and t1.n=t2.n;

NVL(T1.CUST_NAME,T2. EMAIL      DESCRIPTION
-------------------- ---------- --------------------
cust A               a1@a.com   Late delivery
cust A               a2@a.com   Missing components
cust B               b1@b.com   Wrong address
cust C               c1@c.com   Malfunctions
cust C               c2@c.com
cust C               c3@c.com

6 rows selected.

Very ugly. I hope the customer will be happy with his report.

Advertisements

2 Comments on “Joining Unrelated Data”

  1. with row_number and only once customers

    select * from
    (select row_number() over (partition by cust_id order by email) n, email, cust_id
    from contacts )
    full join
    (select row_number() over (partition by cust_id order by description) n, cust_id,
    description from issues)
    using (cust_id,n)
    join customers using (cust_id)

  2. Omer Zak says:

    I think this is a case of using the wrong tool for a job.
    The customer wants a report – look for report generation software, which is powerful enough to express this kind of report.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s