Saturday, November 5, 2011

Data Densification

What is data densification?
Data densification is the process of converting sparse data into dense form. Some times
you may want to see a row output with zero measure from the fact table for a given
combination of dimension values even though no data exists in the fact table. This may
help to load the output result set into another application (for example excel sheet) for
further analysis.

Oracle 10g introduced a new partitioned join that helps data densification issues.
The syntax is:
SELECT …….
FROM table_reference
PARTITION BY (expr [, expr ] ….)
RIGHT OUTER JOIN table_reference

SELECT …….
FROM table_reference
PARTITION BY (expr [, expr ] ….)
LEFT OUTER JOIN table_reference

Note that FULL OUTER JOIN is not supported with a partitioned outer join.

The next example demonstrates the use of PARTITIONED OUTER JOIN.

Let us start with 2 tables.
SQL> desc customer
Name               Null?    Type
-----------------  -------- -----------------------
CUST_ID                    NUMBER(10)
CUST_NAME              VARCHAR2(20)
ADDR                         VARCHAR2(50)
STATE                       CHAR(2)

SQL> desc cust_orders
Name              Null?    Type
----------------- -------- -----------------------
ORDER_ID                   NUMBER(10)
CUST_ID                      NUMBER(10)
ORDER_DATE             DATE
AMT                              NUMBER(10)


SQL> select * from customer order by state ;
CUST_ID CUST_NAME            ADDR      ST                  
----------     -------------------- ------------      --                  
    1001 NAME1                10 JFK RD    FL                  
    1009 NAME9                10 JFK RD    GA                  
    1003 NAME3                10 JFK RD    GA                  
    1007 NAME7                10 JFK RD    GA                  
    1013 NAME13               10 JFK RD    GA                  
    1019 NAME19               10 JFK RD    GA                  
    1008 NAME8                10 JFK RD    NC                  
    1006 NAME6                10 JFK RD    NC                  
    1004 NAME4                10 JFK RD    NC                  
    1002 NAME2                10 JFK RD    NC                  
    1018 NAME18               10 JFK RD    NC                  
    1016 NAME16               10 JFK RD    NC                  
    1014 NAME14               10 JFK RD    NC                  
    1012 NAME12               10 JFK RD    NC                  
    1017 NAME17               10 JFK RD    SC                  
    1011 NAME11               10 JFK RD    SC                  
    1015 NAME15               10 JFK RD    TN                  
    1005 NAME5                10 JFK RD    TN                  
    1020 NAME20               10 JFK RD    TN                  
    1010 NAME10               10 JFK RD    TN                  

20 rows selected.

SQL> select * from cust_orders ;

ORDER_ID    CUST_ID ORDER_DATE         AMT                     
---------- ---------- ----------- ----------                     
       1       1004 23-NOV-2006         38                     
       2       1006 23-NOV-2006         83                     
       3       1002 23-NOV-2006         46                     
       4       1012 23-NOV-2006         72                     
       5       1011 09-DEC-2006         68                     
       6       1009 09-DEC-2006         23                     
       7       1015 09-DEC-2006         44                     
       8       1013 09-DEC-2006         65                     
       9       1017 09-DEC-2006         34                     
      10       1003 19-NOV-2006         68                     
      11       1005 18-NOV-2006         82                     
      12       1006 17-NOV-2006         48                     
      13       1011 16-NOV-2006         46                     
      14       1011 15-NOV-2006         71                     
      15       1004 14-NOV-2006         27                     
      16       1014 13-NOV-2006         90                     
      17       1010 12-NOV-2006         10                     
      18       1005 11-NOV-2006         16                     
      19       1017 10-NOV-2006         97                     
      20       1012 09-NOV-2006         29                     
      21       1001 08-NOV-2006         91                     
      22       1015 07-NOV-2006         66                     
      23       1004 06-NOV-2006         10                     
      24       1001 05-NOV-2006         95                     
      25       1017 04-NOV-2006         24                     
      26       1006 03-NOV-2006         75                     
      27       1002 02-NOV-2006         81                     
      28       1005 01-NOV-2006         74                     
      29       1014 31-OCT-2006         86                     
      30       1010 30-OCT-2006         31                     
      31       1012 29-OCT-2006         37                     
      32       1012 28-OCT-2006         13                     
      33       1013 27-OCT-2006         71                     
      34       1005 26-OCT-2006         63                     
      35       1007 25-OCT-2006         35                     
      36       1016 24-OCT-2006         88                     
      37       1004 23-OCT-2006         90                     
      38       1006 22-OCT-2006         48                     
      39       1016 21-OCT-2006         35                     
      40       1008 20-OCT-2006         47                     
      41       1011 19-OCT-2006         86                     
      42       1008 18-OCT-2006         34                     
      43       1019 17-OCT-2006         95                     
      44       1012 16-OCT-2006         27                     
      45       1017 15-OCT-2006         77                     
      46       1007 14-OCT-2006         50                     
      47       1006 13-OCT-2006         27                     
      48       1005 12-OCT-2006         69                     
      49       1012 11-OCT-2006         97                     
      50       1005 10-OCT-2006         59                     

50 rows selected.

I have nicely populated customer and cust_orders tables.

Our requirement is to get the monthly total order amount  by state, when the state has
not received any order, show 0 amount.

Let us first see what we get without partitioned outer join clause.

SQL> select b.state state,
2         trunc(order_date,'MONTH') SDATE,
3         sum(a.amt) mon_total
4    from cust_orders a , customer b
5    where a.cust_id = b.cust_id
6    group by b.state, trunc(order_date,'MONTH')
7* order by 1, 2
SQL> /

ST SDATE              MON_TOTAL
--    -----------           ---------
FL 01-NOV-2006    186.00
GA 01-OCT-2006    251.00
GA 01-NOV-2006     68.00
GA 01-DEC-2006     88.00
NC 01-OCT-2006    629.00
NC 01-NOV-2006    599.00
SC 01-OCT-2006    163.00
SC 01-NOV-2006    238.00
SC 01-DEC-2006    102.00
TN 01-OCT-2006    222.00
TN 01-NOV-2006    248.00
TN 01-DEC-2006     44.00

12 rows selected.

But the above output is not wanted by you.  You wanted to see 0 total for the FL state
for the month OCT and Dec and 0 total for NC for the DEC month.

Now I changed the SQL statement to include PARTITION OUTER JOIN clause.

SQL> select v1.state, to_char(v2.sdate,'YYYY-MM') smont,
2          nvl(v1.mon_total ,0) mon_total
3  from (
4       select b.state state,
5              trunc(order_date,'MONTH') SDATE,
6              sum(a.amt) mon_total
7       from cust_orders a , customer b
8       where a.cust_id = b.cust_id
9       group by b.state, trunc(order_date,'MONTH')
10       ) V1
11  PARTITION BY (state)
12  RIGHT OUTER JOIN (
13  select trunc(add_months(to_date('01-10-2006','DD-MM-YYYY'),rownum
-1),'MONTH') SDATE
14       from  dual
15       connect by level <= 3 )  V2
16  ON V1.SDATE = V2.SDATE
17* order by 1,2
SQL> /

ST SMONT     MON_TOTAL                                             
-- -------           ---------                                             
FL 2006-10       .00  -------- additional row  
FL 2006-11    186.00                       
FL 2006-12       .00  -------- additional row  
GA 2006-10    251.00                       
GA 2006-11     68.00                       
GA 2006-12     88.00                      
NC 2006-10    629.00                   
NC 2006-11    599.00                                             
NC 2006-12       .00  -------- additional row
SC 2006-10    163.00                      
SC 2006-11    238.00                  
SC 2006-12    102.00             
TN 2006-10    222.00                     
TN 2006-11    248.00                     
TN 2006-12     44.00                   

15 rows selected.

How the above SQL works:
The line # 11, PARTITION BY (state) produces an independent sub set of rows from the
in-line V1 and right outer joined with in-line V2.  The line # 16 is used to join the two in-
line views.

Note the 2nd in-line view, V2 to generate continuous months. It is no longer needed to
use ALL_SOURCE or DBA_OBJECTS to generate rows based on certain logic. A simple
“connect by level” will produce any number of rows in 10g.

Cheers, Tamil

No comments:

Post a Comment