What is data densification?
Data densification is the process of converting sparse data into dense form. Some times
you may want to see a row output with zero measure from the fact table for a given
combination of dimension values even though no data exists in the fact table. This may
help to load the output result set into another application (for example excel sheet) for
further analysis.
Oracle 10g introduced a new partitioned join that helps data densification issues.
The syntax is:
SELECT …….
FROM table_reference
PARTITION BY (expr [, expr ] ….)
RIGHT OUTER JOIN table_reference
SELECT …….
FROM table_reference
PARTITION BY (expr [, expr ] ….)
LEFT OUTER JOIN table_reference
Note that FULL OUTER JOIN is not supported with a partitioned outer join.
The next example demonstrates the use of PARTITIONED OUTER JOIN.
Let us start with 2 tables.
SQL> desc customer
Name Null? Type
----------------- -------- -----------------------
CUST_ID NUMBER(10)
CUST_NAME VARCHAR2(20)
ADDR VARCHAR2(50)
STATE CHAR(2)
SQL> desc cust_orders
Name Null? Type
----------------- -------- -----------------------
ORDER_ID NUMBER(10)
CUST_ID NUMBER(10)
ORDER_DATE DATE
AMT NUMBER(10)
SQL> select * from customer order by state ;
CUST_ID CUST_NAME ADDR ST
---------- -------------------- ------------ --
1001 NAME1 10 JFK RD FL
1009 NAME9 10 JFK RD GA
1003 NAME3 10 JFK RD GA
1007 NAME7 10 JFK RD GA
1013 NAME13 10 JFK RD GA
1019 NAME19 10 JFK RD GA
1008 NAME8 10 JFK RD NC
1006 NAME6 10 JFK RD NC
1004 NAME4 10 JFK RD NC
1002 NAME2 10 JFK RD NC
1018 NAME18 10 JFK RD NC
1016 NAME16 10 JFK RD NC
1014 NAME14 10 JFK RD NC
1012 NAME12 10 JFK RD NC
1017 NAME17 10 JFK RD SC
1011 NAME11 10 JFK RD SC
1015 NAME15 10 JFK RD TN
1005 NAME5 10 JFK RD TN
1020 NAME20 10 JFK RD TN
1010 NAME10 10 JFK RD TN
20 rows selected.
SQL> select * from cust_orders ;
ORDER_ID CUST_ID ORDER_DATE AMT
---------- ---------- ----------- ----------
1 1004 23-NOV-2006 38
2 1006 23-NOV-2006 83
3 1002 23-NOV-2006 46
4 1012 23-NOV-2006 72
5 1011 09-DEC-2006 68
6 1009 09-DEC-2006 23
7 1015 09-DEC-2006 44
8 1013 09-DEC-2006 65
9 1017 09-DEC-2006 34
10 1003 19-NOV-2006 68
11 1005 18-NOV-2006 82
12 1006 17-NOV-2006 48
13 1011 16-NOV-2006 46
14 1011 15-NOV-2006 71
15 1004 14-NOV-2006 27
16 1014 13-NOV-2006 90
17 1010 12-NOV-2006 10
18 1005 11-NOV-2006 16
19 1017 10-NOV-2006 97
20 1012 09-NOV-2006 29
21 1001 08-NOV-2006 91
22 1015 07-NOV-2006 66
23 1004 06-NOV-2006 10
24 1001 05-NOV-2006 95
25 1017 04-NOV-2006 24
26 1006 03-NOV-2006 75
27 1002 02-NOV-2006 81
28 1005 01-NOV-2006 74
29 1014 31-OCT-2006 86
30 1010 30-OCT-2006 31
31 1012 29-OCT-2006 37
32 1012 28-OCT-2006 13
33 1013 27-OCT-2006 71
34 1005 26-OCT-2006 63
35 1007 25-OCT-2006 35
36 1016 24-OCT-2006 88
37 1004 23-OCT-2006 90
38 1006 22-OCT-2006 48
39 1016 21-OCT-2006 35
40 1008 20-OCT-2006 47
41 1011 19-OCT-2006 86
42 1008 18-OCT-2006 34
43 1019 17-OCT-2006 95
44 1012 16-OCT-2006 27
45 1017 15-OCT-2006 77
46 1007 14-OCT-2006 50
47 1006 13-OCT-2006 27
48 1005 12-OCT-2006 69
49 1012 11-OCT-2006 97
50 1005 10-OCT-2006 59
50 rows selected.
I have nicely populated customer and cust_orders tables.
Our requirement is to get the monthly total order amount by state, when the state has
not received any order, show 0 amount.
Let us first see what we get without partitioned outer join clause.
SQL> select b.state state,
2 trunc(order_date,'MONTH') SDATE,
3 sum(a.amt) mon_total
4 from cust_orders a , customer b
5 where a.cust_id = b.cust_id
6 group by b.state, trunc(order_date,'MONTH')
7* order by 1, 2
SQL> /
ST SDATE MON_TOTAL
-- ----------- ---------
FL 01-NOV-2006 186.00
GA 01-OCT-2006 251.00
GA 01-NOV-2006 68.00
GA 01-DEC-2006 88.00
NC 01-OCT-2006 629.00
NC 01-NOV-2006 599.00
SC 01-OCT-2006 163.00
SC 01-NOV-2006 238.00
SC 01-DEC-2006 102.00
TN 01-OCT-2006 222.00
TN 01-NOV-2006 248.00
TN 01-DEC-2006 44.00
12 rows selected.
But the above output is not wanted by you. You wanted to see 0 total for the FL state
for the month OCT and Dec and 0 total for NC for the DEC month.
Now I changed the SQL statement to include PARTITION OUTER JOIN clause.
SQL> select v1.state, to_char(v2.sdate,'YYYY-MM') smont,
2 nvl(v1.mon_total ,0) mon_total
3 from (
4 select b.state state,
5 trunc(order_date,'MONTH') SDATE,
6 sum(a.amt) mon_total
7 from cust_orders a , customer b
8 where a.cust_id = b.cust_id
9 group by b.state, trunc(order_date,'MONTH')
10 ) V1
11 PARTITION BY (state)
12 RIGHT OUTER JOIN (
13 select trunc(add_months(to_date('01-10-2006','DD-MM-YYYY'),rownum
-1),'MONTH') SDATE
14 from dual
15 connect by level <= 3 ) V2
16 ON V1.SDATE = V2.SDATE
17* order by 1,2
SQL> /
ST SMONT MON_TOTAL
-- ------- ---------
FL 2006-10 .00 -------- additional row
FL 2006-11 186.00
FL 2006-12 .00 -------- additional row
GA 2006-10 251.00
GA 2006-11 68.00
GA 2006-12 88.00
NC 2006-10 629.00
NC 2006-11 599.00
NC 2006-12 .00 -------- additional row
SC 2006-10 163.00
SC 2006-11 238.00
SC 2006-12 102.00
TN 2006-10 222.00
TN 2006-11 248.00
TN 2006-12 44.00
15 rows selected.
How the above SQL works:
The line # 11, PARTITION BY (state) produces an independent sub set of rows from the
in-line V1 and right outer joined with in-line V2. The line # 16 is used to join the two in-
line views.
Note the 2nd in-line view, V2 to generate continuous months. It is no longer needed to
use ALL_SOURCE or DBA_OBJECTS to generate rows based on certain logic. A simple
“connect by level” will produce any number of rows in 10g.
Data densification is the process of converting sparse data into dense form. Some times
you may want to see a row output with zero measure from the fact table for a given
combination of dimension values even though no data exists in the fact table. This may
help to load the output result set into another application (for example excel sheet) for
further analysis.
Oracle 10g introduced a new partitioned join that helps data densification issues.
The syntax is:
SELECT …….
FROM table_reference
PARTITION BY (expr [, expr ] ….)
RIGHT OUTER JOIN table_reference
SELECT …….
FROM table_reference
PARTITION BY (expr [, expr ] ….)
LEFT OUTER JOIN table_reference
Note that FULL OUTER JOIN is not supported with a partitioned outer join.
The next example demonstrates the use of PARTITIONED OUTER JOIN.
Let us start with 2 tables.
SQL> desc customer
Name Null? Type
----------------- -------- -----------------------
CUST_ID NUMBER(10)
CUST_NAME VARCHAR2(20)
ADDR VARCHAR2(50)
STATE CHAR(2)
SQL> desc cust_orders
Name Null? Type
----------------- -------- -----------------------
ORDER_ID NUMBER(10)
CUST_ID NUMBER(10)
ORDER_DATE DATE
AMT NUMBER(10)
SQL> select * from customer order by state ;
CUST_ID CUST_NAME ADDR ST
---------- -------------------- ------------ --
1001 NAME1 10 JFK RD FL
1009 NAME9 10 JFK RD GA
1003 NAME3 10 JFK RD GA
1007 NAME7 10 JFK RD GA
1013 NAME13 10 JFK RD GA
1019 NAME19 10 JFK RD GA
1008 NAME8 10 JFK RD NC
1006 NAME6 10 JFK RD NC
1004 NAME4 10 JFK RD NC
1002 NAME2 10 JFK RD NC
1018 NAME18 10 JFK RD NC
1016 NAME16 10 JFK RD NC
1014 NAME14 10 JFK RD NC
1012 NAME12 10 JFK RD NC
1017 NAME17 10 JFK RD SC
1011 NAME11 10 JFK RD SC
1015 NAME15 10 JFK RD TN
1005 NAME5 10 JFK RD TN
1020 NAME20 10 JFK RD TN
1010 NAME10 10 JFK RD TN
20 rows selected.
SQL> select * from cust_orders ;
ORDER_ID CUST_ID ORDER_DATE AMT
---------- ---------- ----------- ----------
1 1004 23-NOV-2006 38
2 1006 23-NOV-2006 83
3 1002 23-NOV-2006 46
4 1012 23-NOV-2006 72
5 1011 09-DEC-2006 68
6 1009 09-DEC-2006 23
7 1015 09-DEC-2006 44
8 1013 09-DEC-2006 65
9 1017 09-DEC-2006 34
10 1003 19-NOV-2006 68
11 1005 18-NOV-2006 82
12 1006 17-NOV-2006 48
13 1011 16-NOV-2006 46
14 1011 15-NOV-2006 71
15 1004 14-NOV-2006 27
16 1014 13-NOV-2006 90
17 1010 12-NOV-2006 10
18 1005 11-NOV-2006 16
19 1017 10-NOV-2006 97
20 1012 09-NOV-2006 29
21 1001 08-NOV-2006 91
22 1015 07-NOV-2006 66
23 1004 06-NOV-2006 10
24 1001 05-NOV-2006 95
25 1017 04-NOV-2006 24
26 1006 03-NOV-2006 75
27 1002 02-NOV-2006 81
28 1005 01-NOV-2006 74
29 1014 31-OCT-2006 86
30 1010 30-OCT-2006 31
31 1012 29-OCT-2006 37
32 1012 28-OCT-2006 13
33 1013 27-OCT-2006 71
34 1005 26-OCT-2006 63
35 1007 25-OCT-2006 35
36 1016 24-OCT-2006 88
37 1004 23-OCT-2006 90
38 1006 22-OCT-2006 48
39 1016 21-OCT-2006 35
40 1008 20-OCT-2006 47
41 1011 19-OCT-2006 86
42 1008 18-OCT-2006 34
43 1019 17-OCT-2006 95
44 1012 16-OCT-2006 27
45 1017 15-OCT-2006 77
46 1007 14-OCT-2006 50
47 1006 13-OCT-2006 27
48 1005 12-OCT-2006 69
49 1012 11-OCT-2006 97
50 1005 10-OCT-2006 59
50 rows selected.
I have nicely populated customer and cust_orders tables.
Our requirement is to get the monthly total order amount by state, when the state has
not received any order, show 0 amount.
Let us first see what we get without partitioned outer join clause.
SQL> select b.state state,
2 trunc(order_date,'MONTH') SDATE,
3 sum(a.amt) mon_total
4 from cust_orders a , customer b
5 where a.cust_id = b.cust_id
6 group by b.state, trunc(order_date,'MONTH')
7* order by 1, 2
SQL> /
ST SDATE MON_TOTAL
-- ----------- ---------
FL 01-NOV-2006 186.00
GA 01-OCT-2006 251.00
GA 01-NOV-2006 68.00
GA 01-DEC-2006 88.00
NC 01-OCT-2006 629.00
NC 01-NOV-2006 599.00
SC 01-OCT-2006 163.00
SC 01-NOV-2006 238.00
SC 01-DEC-2006 102.00
TN 01-OCT-2006 222.00
TN 01-NOV-2006 248.00
TN 01-DEC-2006 44.00
12 rows selected.
But the above output is not wanted by you. You wanted to see 0 total for the FL state
for the month OCT and Dec and 0 total for NC for the DEC month.
Now I changed the SQL statement to include PARTITION OUTER JOIN clause.
SQL> select v1.state, to_char(v2.sdate,'YYYY-MM') smont,
2 nvl(v1.mon_total ,0) mon_total
3 from (
4 select b.state state,
5 trunc(order_date,'MONTH') SDATE,
6 sum(a.amt) mon_total
7 from cust_orders a , customer b
8 where a.cust_id = b.cust_id
9 group by b.state, trunc(order_date,'MONTH')
10 ) V1
11 PARTITION BY (state)
12 RIGHT OUTER JOIN (
13 select trunc(add_months(to_date('01-10-2006','DD-MM-YYYY'),rownum
-1),'MONTH') SDATE
14 from dual
15 connect by level <= 3 ) V2
16 ON V1.SDATE = V2.SDATE
17* order by 1,2
SQL> /
ST SMONT MON_TOTAL
-- ------- ---------
FL 2006-10 .00 -------- additional row
FL 2006-11 186.00
FL 2006-12 .00 -------- additional row
GA 2006-10 251.00
GA 2006-11 68.00
GA 2006-12 88.00
NC 2006-10 629.00
NC 2006-11 599.00
NC 2006-12 .00 -------- additional row
SC 2006-10 163.00
SC 2006-11 238.00
SC 2006-12 102.00
TN 2006-10 222.00
TN 2006-11 248.00
TN 2006-12 44.00
15 rows selected.
How the above SQL works:
The line # 11, PARTITION BY (state) produces an independent sub set of rows from the
in-line V1 and right outer joined with in-line V2. The line # 16 is used to join the two in-
line views.
Note the 2nd in-line view, V2 to generate continuous months. It is no longer needed to
use ALL_SOURCE or DBA_OBJECTS to generate rows based on certain logic. A simple
“connect by level” will produce any number of rows in 10g.
Cheers, Tamil
No comments:
Post a Comment