Saturday, November 5, 2011

Oracle Analytic Functions

The benefits of analytical function are many:
 1. Reduced IO
 2. Complex Query can be written in a simple SQL
 3. Less CPU time

Let us start with a small table.

SELECT * FROM T1 ;

EMPLID          PAY_DATE
--------------- -----------
A1              31-JAN-2000
A1              28-FEB-2000
A1              31-MAR-2000
A1              31-MAY-2000
B1              31-JAN-2000
B1              31-MAR-2000
B1              31-JUL-2000
C1              31-JAN-2000
C1              28-FEB-2000
C1              31-MAR-2000
C1              31-DEC-2000
D1              31-JAN-2000
D1              31-MAR-2000
D1              31-OCT-2000

14 rows selected.

Elapsed: 00:00:00.30

Statistics
----------------------------------------------------------
   0  recursive calls
   0  db block gets
   4  consistent gets
   0  physical reads
   0  redo size
 569  bytes sent via SQL*Net to client
 253  bytes received via SQL*Net from client
   2  SQL*Net roundtrips to/from client
   0  sorts (memory)
   0  sorts (disk)
  14  rows processed

The Table T1 has 14 rows and there are 4 distinct EMPLID.

Our requirement is:  List out employee with the last pay date.

Without analytical function, the following SQL performs the task.
I have used max function in the sub query.

SELECT EMPLID, PAY_DATE FROM T1 A
WHERE A.PAY_DATE = ( SELECT MAX(PAY_DATE) FROM T1 B
                WHERE B.EMPLID = A.EMPLID) ;

EMPLID          PAY_DATE
--------------- -----------
A1              31-MAY-2000
B1              31-JUL-2000
C1              31-DEC-2000
D1              31-OCT-2000

Elapsed: 00:00:00.80

Statistics
----------------------------------------------------------
   0  recursive calls
   0  db block gets
  16  consistent gets
   1  physical reads
   0  redo size
 319  bytes sent via SQL*Net to client
 253  bytes received via SQL*Net from client
   2  SQL*Net roundtrips to/from client
   0  sorts (memory)
   0  sorts (disk)
   4  rows processed
The total number of logical IOs is 16.You can see FULL table scan is happening for the sub query also.
Execution Plan
----------------------------------------------------------
   0   SELECT STATEMENT Optimizer=CHOOSE
   1   0   FILTER
   2   1     TABLE ACCESS (FULL) OF 'T1'
   3   1     SORT (AGGREGATE)
   4   3       TABLE ACCESS (FULL) OF 'T1'

To speed up execution time, I create an index on emplid, pay_date columns

create index t1_idx on t1(emplid, pay_date)
tablespace psindex storage(initial 32k next 32k pctincrease 0) pctfree 0;

Execution Plan
----------------------------------------------------------
   0   SELECT STATEMENT Optimizer=CHOOSE
   1   0   FILTER
   2   1     TABLE ACCESS (FULL) OF 'T1'
   3   1     SORT (AGGREGATE)
   4   3       INDEX (RANGE SCAN) OF 'T1_IDX' (NON-UNIQUE)

Statistics
----------------------------------------------------------
   0  recursive calls
   0  db block gets
   6  consistent gets
   0  physical reads
   0  redo size
 333  bytes sent via SQL*Net to client
 254  bytes received via SQL*Net from client
   2  SQL*Net roundtrips to/from client
   0  sorts (memory)
   0  sorts (disk)
   4  rows processed
====================================================================
Still, 6 LIO are needed.

Let me change the query to use ROW_NUMBER function.

SELECT EMPLID, PAY_DATE
FROM ( SELECT EMPLID,
          PAY_DATE,
          (ROW_NUMBER() OVER
                   (PARTITION BY EMPLID ORDER BY PAY_DATE DESC)) RN
     FROM T1
    ORDER BY EMPLID, RN
 )
WHERE RN = 1 ;

EMPLID          PAY_DATE
--------------- -----------
A1              31-MAY-2000
B1              31-JUL-2000
C1              31-DEC-2000
D1              31-OCT-2000

Statistics
----------------------------------------------------------
   0  recursive calls
   0  db block gets
   3  consistent gets
   0  physical reads
   0  redo size
 320  bytes sent via SQL*Net to client
 253  bytes received via SQL*Net from client
   2  SQL*Net roundtrips to/from client
   1  sorts (memory)
   0  sorts (disk)
   4  rows processed

====================================================================
I created an index T1_IDX on T1 table (EMPLID,PAY_DATE).

SELECT EMPLID, PAY_DATE
 FROM ( SELECT EMPLID,
               PAY_DATE,
     (ROW_NUMBER() OVER (PARTITION BY EMPLID ORDER BY PAY_DATE DESC)) RN
          FROM T1
         ORDER BY EMPLID, RN
    )
WHERE RN = 1;

A1              31-MAY-2000
B1              31-JUL-2000
C1              31-DEC-2000
D1              31-OCT-2000

Elapsed: 00:00:04.66

Execution Plan
----------------------------------------------------------
   0   SELECT STATEMENT Optimizer=CHOOSE
   1   0   VIEW
   2   1     WINDOW (SORT)
   3   2       TABLE ACCESS (FULL) OF 'T1'

rows will be truncated

Statistics
----------------------------------------------------------
   0  recursive calls
   0  db block gets
   3  consistent gets
   0  physical reads
   0  redo size
 334  bytes sent via SQL*Net to client
 254  bytes received via SQL*Net from client
   2  SQL*Net roundtrips to/from client
   1  sorts (memory)
   0  sorts (disk)
   4  rows processed


You will frequently see WINDOW (SORT) in the execution plan when the SQL uses analytical
function. I get consistently 3 logical IOs whether an index is present on the table or not.
And Oracle sorts the each employee rows within a WINDOW.
TOP-N SQLselect * from emp;

EMPID      SMONTH            SAL
---------- ----------          ----------
AAA        JAN               100
BBB        JAN               200
CCC        JAN               300
DDD        JAN               400
CCC        FEB               400
BBB        FEB               300
DDD        FEB                30
AAA        FEB                80
AAA        MAR               100
BBB        MAR               200
CCC        MAR               300
DDD        MAR                20

12 rows selected.

Question: List top 3 salaried employees by month.

Answer:
SELECT EMPID, SMONTH, SAL
FROM ( SELECT EMPID, SMONTH, SAL, ROW_NUMBER()
         OVER (PARTITION BY SMONTH ORDER BY SAL DESC ) RN
FROM EMP)
WHERE RN <= 3
ORDER BY 2,1;

EMPID      SMONTH      SAL
----------    ----------      ----------
AAA        FEB                80
BBB        FEB               300
CCC        FEB               400
BBB        JAN               200
CCC        JAN               300
DDD        JAN               400
AAA        MAR               100
BBB        MAR               200
CCC        MAR               300

9 rows selected.
Ratio_To_Report Analytic FunctionSQL>DESC SALES
Name             Null?    Type
------------ -------- ---------------------
PRODUCT              VARCHAR2(20)
SALES                    NUMBER


SQL>SELECT * FROM SALES ;

PRODUCT                   SALES
-------------------- ----------
P1                          280
P2                          400
P3                          200
P4                          300
P5                          190
P6                          310

6 rows selected.

Elapsed: 00:00:00.50

SQL>SELECT PRODUCT, SALES ,
                 TO_CHAR((RATIO_TO_REPORT(SALES)
                                OVER())*100,999.99) AS SALES_PER
    FROM SALES ;

PRODUCT           SALES SALES_P
--------------------   ---------- -------
P1                          280   16.67
P2                          400   23.81
P3                          200   11.90
P4                          300   17.86
P5                          190   11.31
P6                          310   18.45

6 rows selected.

Oracle is continuously introducing new analytical functions in each release.
Check out the manual for the enhancements.

No comments:

Post a Comment