Different people write SQL queries differently.  More often than not, the approach reflects the person’s way of life.  Some people draw all the relationships out in a diagram, then write the query adding one table at a time.  Some people just write the queries out straight.  There is also this other kind that dumps 20+ tables into a BI query report tool and wonder why the query result is so much bigger than it is supposed to be.  Hacking a way out of a mess is not going to produce quality work.

I am the first kind.  Knowing all the relationships before writing the query gives you a pretty good idea of how things are laid out.

When you were the last type mentioned above and you had this massive web of table relationships to make sense out of, you would for sure miss out some details.  One such hidden details that people would overlook is a phenomenon that I call the Cartesian Product of Unrelated Children, for lack of a better term.  Read on.

Cartesian Product

When we write queries, we always try to make sure that we don’t have a Cartesian Product(CP).  And it’s easy to tell if we have one.  If you have 4 tables, you should have 3 joins.  If you miss out one join, you get a CP.  The outcome is that the query is split into 2 parts: one where the tables are joined together and the other with the one table left out.  Then the 2 parts’ records are “multiplied” together, hence the name Cartesian Product.

So in this example, if table A has 1 record, which has 5 child records in table B.  And then those 5 records have 25 child records in table C.  Table D has 5 records.  The end result is going to be 125 records.

However, have you seen a CP but without a missing join?




The Unrelated Children

This is the kind of problem like a needle you can’t pick out from a haystack.  The following simplified example is perfectly fine in data modeling.  For example, a person can have many books and also can have many cars.

However, if you write this out and put all these tables in a query, you’ll get a CP, unless only one of the children tables has multiple related records and all others have just one record.  If a specific person record has 2 associated book records and has 2 associated car records, I the end result forms a CP of 4 records.

The Example In Action

So the following example depicts a scenario where one person owns 3 books and 3 cars.  When queries altogether, it becomes a cartesian product of 9 records.

Script

create table person (pk number primary key);
create table book (pk number primary key, fk number);
alter table book add constraint fk1 foreign key (fk) references person (pk);
create table car (pk number primary key, fk number);
alter table car add constraint fk2 foreign key (fk) references person (pk);

insert into person values (1);
insert into book values (1,1);
insert into book values (2,1);
insert into book values (3,1);
insert into car values (1,1);
insert into car values (2,1);
insert into car values (3,1);

select count(1)
from person p, book b, car c
where p.pk = b.fk
and p.pk = c.fk;
drop table book;
drop table car;
drop table person;




Output

07:00:24 SQL> create table person (pk number primary key);
Table created.
Elapsed: 00:00:00.01
07:00:26 SQL>
07:00:26 SQL> create table book (pk number primary key, fk number);
Table created.
Elapsed: 00:00:00.01
07:00:30 SQL> alter table book add constraint fk1 foreign key (fk) references person (pk);
Table altered.
Elapsed: 00:00:00.01
07:00:34 SQL> create table car (pk number primary key, fk number);
Table created.
Elapsed: 00:00:00.02
07:00:39 SQL> alter table car add constraint fk2 foreign key (fk) references person (pk);
Table altered.
Elapsed: 00:00:00.01
07:00:42 SQL> insert into person values (1);
1 row created.
Elapsed: 00:00:00.02
07:00:56 SQL> insert into book values (1,1);
1 row created.
Elapsed: 00:00:00.00
07:00:59 SQL> insert into book values (2,1);
1 row created.
Elapsed: 00:00:00.00
07:01:03 SQL> insert into book values (3,1);
1 row created.
Elapsed: 00:00:00.00
07:01:06 SQL> insert into car values (1,1);
1 row created.
Elapsed: 00:00:00.00
07:01:09 SQL> insert into car values (2,1);
1 row created.
Elapsed: 00:00:00.00
07:01:13 SQL> insert into car values (3,1);
1 row created.
Elapsed: 00:00:00.00
07:01:21 SQL> select count(1)
from person p, book b, car c
where p.pk = b.fk
and p.pk = c.fk;
07:01:26 2 07:01:26 3 07:01:26 4
COUNT(1)
----------
9
1 row selected.
Elapsed: 00:00:00.01
07:01:26 SQL> drop table book;
drop table car;
drop table person;
Table dropped.
Elapsed: 00:00:00.03
07:01:38 SQL>
Table dropped.
Elapsed: 00:00:00.03
07:01:38 SQL>
Table dropped.
Elapsed: 00:00:00.03

The Bottom Line

We simply should not include two or more unrelated children tables in a query.  Oracle does not have a way to determine the relationship between the two or more child tables, other than the fact that they both are child tables of the parent.  But since they’re in the same query, Oracle would have to merge them together somehow, thereby creating the CP.  The only exception is when you know for sure that at least one of the child tables would return only one single row for each parent row.  But even so, the data model allows multiple children rows.  The query would not work consistently for other cases.  In this case, what I would definitely do is to split off the child tables into separate queries.

Have fun coding!

References

https://en.wikipedia.org/wiki/Cartesian_product

Leave a Reply

Your email address will not be published. Required fields are marked *