Friday, November 25, 2011

SQL Loader Performance

First Written Date: Nov 14, 2005

The SQL*Loader utility has been there for many years; and many of us used it to load data into the database.
Till date I have not seen any other utility from 3rd party vendor that works faster than SQL*Loader.  This is one of my favorite
utilities; I enjoyed working with it.  There are minor enhancements in 9i. Please go through the manual, if you want to learn
more.  

SQL*Loader provides two methods of loading data:

1 Conventional Path Load - Employs SQL to insert data, thus generates REDO and UNDO

2 Direct Path Load – Does not employ SQL, writes the data directly into tables’ blocks.
                         Hence, NO REDO & NO UNDO.

In general “Conventional Path Load” works always slow. Hence, I am not going to discuss any thing about it.

Most of the DBAs and developers never examined the various options provided in the DIRECT Path option.

Next I am going to demonstrate 3 different options under direct path load.  

1. Simple Direct Path Load – No Parallel

2. Parallel Direct Path Load with multiple data files

3. Parallel Direct Path Load with single data file

I have a SALES table in which I am going to load Millions of records. The table looks like:

Table Definition
SQL> desc sales

Name                Null?    Type
------------------- -------- --------------
CUST_ID                       NUMBER(12)
STATE                           CHAR(2)
SALE_YEAR                 NUMBER(4)
SALE_MONTH              NUMBER(2)
PROD_ID                       NUMBER
QTY                               NUMBER
AMT                               NUMBER
COMM_1                       VARCHAR2(50)
COMM_2                       VARCHAR2(200)
Next, I created a control file. See below the definition:

Control file:  load_sales.ctl

LOAD DATA
APPEND
INTO TABLE TAMIL.SALES
FIELDS TERMINATED BY ',' TRAILING NULLCOLS
(
CUST_ID   ,  STATE ,  SALE_YEAR  ,  SALE_MONTH  ,
PROD_ID   ,  QTY  ,  AMT    ,  COMM_1  ,  COMM_2
)
The above control file has been used for all my testings.

The data file is sales_data.dat that has around 14.3 Million rows.  And its size is 2.8GB.

To complete the exercise, I just benchmarked the load performance in “Conventional Mode”.
I.        Conventional Mode

Script:  load_sales_conv.ksh
#!/bin/ksh
date
rm /u01/oratest/arch/*.arc
sqlldr / control=load_sales.ctl  data=sales_data.dat log=load_sales_conv.log \
bindsize=4194304 rows=3000
date
exit 0

Log file: load_sales_conv.log
SQL*Loader: Release 9.2.0.5.0 - Production on Thu Nov 10 23:37:04 2005
Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.
Control File:   load_sales.ctl
Data File:      sales_data.dat
Bad File:     sales_data.bad
Discard File:  none specified
(Allow all discards)
Table TAMIL.SALES:
14,336,064 Rows successfully loaded.
0 Rows not loaded due to data errors.
0 Rows not loaded because all WHEN clauses were failed.
2 Rows not loaded because all fields were null.
Space allocated for bind array:                4193532 bytes(1806 rows)
Read   buffer bytes: 4194304
Total logical records skipped:          0
Total logical records read:      14336066
Total logical records rejected:         0
Total logical records discarded:        2
Run began on Thu Nov 10 23:37:04 2005
Run ended on Thu Nov 10 23:45:21 2005
Elapsed time was:     00:08:17.84

CPU time was:           00:03:08.05

The parameters I used with SQL*Loader are bindsize and rows.
bindsize -- Size of conventional path bind array in bytes  (Default 256000)
rows -- Number of rows in conventional path bind array or between direct path data saves
     (Default: Conventional path 64, Direct path all)

The “Conventional Method” consumed 8 Minutes. Obviously, it ran slowly. And I am sure that You are not interested to run this
mode in your production environment.  

Let us move on to study  the “frequently used” method – “Direct Method “.

II  - Direct Mode

Script: load_sales_direct.ksh
#!/bin/ksh
date
## rm /u01/oratest/arch/*.arc
## 512KB = <524288>
## 1 MB =<1048576>
## 2 MB = <2097152>
## 4 MB = <4194304>
## 8 MB = <8388608>

sqlldr / control=load_sales.ctl bad=bad_direct.dat streamsize=1048576 readsize=1048576 \
data=sales_data.dat log=load_sales_direct.log columnarrayrows=8000 \
multithreading=true direct=true  &
date
exit 0

In the above script, I had used “STREAMSIZE”, “READSIZE”, “COLUMNARRAYROWS” and “MULTITHREADING”  apart from
“direct =true”.
Where
streamsize -- Size of direct path stream buffer in bytes  (Default 256000)
readsize -- Size of Read buffer                  (Default 1048576)
multithreading -- use multithreading in direct path
columnarrayrows -- Number of rows for direct path column array  (Default 5000)

I repeated the load test for a different combination of values for the 4 parameters, and showed below the best result.

Log file: load_sales_direct.log
…..
….
Table TAMIL.SALES:
14336064 Rows successfully loaded.
0 Rows not loaded due to data errors.
0 Rows not loaded because all WHEN clauses were failed.
2 Rows not loaded because all fields were null.
Bind array size not used in direct path.
Column array  rows :    8000
Stream buffer bytes: 1048576
Read   buffer bytes: 1048576
Total logical records skipped:          0
Total logical records read:      14336066
Total logical records rejected:         0
Total logical records discarded:        2
Total stream buffers loaded by SQL*Loader main thread:     2749
Total stream buffers loaded by SQL*Loader load thread:     2748
Run began on Fri Nov 11 03:08:34 2005

Run ended on Fri Nov 11 03:10:29 2005

Elapsed time was:     00:01:54.79
CPU time was:           00:01:28.29                    Total Run Time = 114 Seconds
The total run time is 114 Seconds which is many times better than “conventional method load time” of 8 Min.


III    Load data in Parallel in Direct Mode with Split the data file

When “the direct load test”  was running , I observed the CPU, memory and disks utilization;  Potentially they have not been
used much.  With 8 CPUs and 16 GB RAM,  my aim was how I should beat 114 seconds, got from the “DIRECT LOAD”
method.  
The only way I could beat the 114 seconds run time is to divide and conquer approach. That is, using parallel option. However,
I have only one data file.  Hence, I split the data file (see below the shell script) and  used 8 parallel threads to complete the
load process.

Script :
#!/bin/ksh
date
/usr/bin/split -l2000000 /u01/oratest/sales_data.dat sales
mv salesaa salesaa.dat
mv salesab salesab.dat
mv salesac salesac.dat
mv salesad salesad.dat
mv salesae salesae.dat
mv salesaf salesaf.dat
mv salesag salesag.dat
mv salesah salesah.dat
echo "Split is done"
date
sqlldr / control=load_sales.ctl bad=bad_1.dat streamsize=1048576 readsize=1048576 \
data=salesaa.dat log=load_sales_pll_1.log columnarrayrows=8000 \
file=/u01/oratest/sbltest_large_data01.dbf \
multithreading=true direct=true parallel=true &
sqlldr / control=load_sales.ctl bad=bad_2.dat streamsize=1048576 readsize=1048576 \
data=salesab.dat log=load_sales_pll_2.log columnarrayrows=8000 \
file=/u01/oratest/sbltest_large_data02.dbf \
multithreading=true direct=true parallel=true  &
sqlldr / control=load_sales.ctl bad=bad_3.dat streamsize=1048576 readsize=1048576 \
data=salesac.dat log=load_sales_pll_3.log columnarrayrows=8000 \
file=/u01/oratest/sbltest_large_data03.dbf \
multithreading=true direct=true parallel=true   &
sqlldr / control=load_sales.ctl bad=bad_4.dat streamsize=1048576 readsize=1048576 \
data=salesad.dat log=load_sales_pll_4.log columnarrayrows=8000 \
file=/u01/oratest/sbltest_large_data04.dbf \
multithreading=true direct=true parallel=true &
sqlldr / control=load_sales.ctl bad=bad_5.dat streamsize=1048576 readsize=1048576 \
data=salesae.dat log=load_sales_pll_5.log columnarrayrows=8000 \
file=/u01/oratest/sbltest_large_data05.dbf \
multithreading=true direct=true parallel=true &
sqlldr / control=load_sales.ctl bad=bad_6.dat streamsize=1048576 readsize=1048576 \
data=salesaf.dat log=load_sales_pll_6.log columnarrayrows=8000 \
file=/u01/oratest/sbltest_large_data06.dbf \
multithreading=true direct=true parallel=true  &
sqlldr / control=load_sales.ctl bad=bad_7.dat streamsize=1048576 readsize=1048576 \
data=salesag.dat log=load_sales_pll_7.log columnarrayrows=8000 \
file=/u01/oratest/sbltest_large_data07.dbf \
multithreading=true direct=true parallel=true &
sqlldr / control=load_sales.ctl bad=bad_8.dat streamsize=1048576 readsize=1048576 \
data=salesah.dat log=load_sales_pll_8.log  columnarrayrows=8000 \
file=/u01/oratest/sbltest_large_data08.dbf \
multithreading=true direct=true parallel=true &
wait
date
exit 0

Each parallel thread reads a small file of size 400MB and loads it directtly into the data blocks.
By adding FILE parameter , I force each parallel thread to write the data into  separate data file with in the tablespace.

Log file: nohup_split.out
Sat Nov 12 17:16:36 GMT 2005
Split is done
Sat Nov 12 17:19:11 GMT 2005
SQL*Loader: Release 9.2.0.5.0 - Production on Sat Nov 12 17:19:11 2005
Load completed - logical record count 336066.
Load completed - logical record count 2000000.
Load completed - logical record count 2000000.
Load completed - logical record count 2000000.
Load completed - logical record count 2000000.
Load completed - logical record count 2000000.
Load completed - logical record count 2000000.
Load completed - logical record count 2000000.
Sat Nov 12 17:19:35 GMT 2005                             Total Run Time 180 Seconds.

In this method, the UNIX split consumed 156 seconds and the SQL*Loader ran for 24 seconds.
The result is NOT attractive compared with “simple direct load” run time of 114 seconds.   

IV  Load data in Parallel in Direct Mode without Spliting the data file

I have to come out with a different idea, in order to beat  “direct load run time of 114” seconds. Finally two parameters SKIP and
LOAD gave me a clue. By combining three parameters PARALLEL, SKIP  and LOAD, I wrote the following script to load the
data file.

Script : load_sales_pll.ksh
#!/bin/ksh
date
## 512KB = <524288>
## 1 MB =<1048576>
## 2 MB = <2097152>
## 4 MB = <4194304>
## 8 MB = <8388608>
sqlldr / control=load_sales.ctl bad=bad_1.dat streamsize=1048576 readsize=1048576 \
data=sales_data.dat log=load_sales_pll_1.log columnarrayrows=8000 \
file=/u01/oratest/sbltest_large_data01.dbf \
multithreading=true direct=true parallel=true load=2000000 &
sqlldr / control=load_sales.ctl bad=bad_2.dat streamsize=1048576 readsize=1048576 \
data=sales_data.dat log=load_sales_pll_2.log columnarrayrows=8000 \
file=/u01/oratest/sbltest_large_data02.dbf \
multithreading=true direct=true parallel=true skip=2000000 load=2000000 &
sqlldr / control=load_sales.ctl bad=bad_3.dat streamsize=1048576 readsize=1048576 \
data=sales_data.dat log=load_sales_pll_3.log columnarrayrows=8000 \
file=/u01/oratest/sbltest_large_data03.dbf \
multithreading=true direct=true parallel=true skip=4000000 load=2000000  &
sqlldr / control=load_sales.ctl bad=bad_4.dat streamsize=1048576 readsize=1048576 \
data=sales_data.dat log=load_sales_pll_4.log columnarrayrows=8000 \
file=/u01/oratest/sbltest_large_data04.dbf \
multithreading=true direct=true parallel=true skip=6000000 load=2000000 &
sqlldr / control=load_sales.ctl bad=bad_5.dat streamsize=1048576 readsize=1048576 \
data=sales_data.dat log=load_sales_pll_5.log columnarrayrows=8000 \
file=/u01/oratest/sbltest_large_data05.dbf \
multithreading=true direct=true parallel=true skip=8000000 load=2000000 &
sqlldr / control=load_sales.ctl bad=bad_6.dat streamsize=1048576 readsize=1048576 \
data=sales_data.dat log=load_sales_pll_6.log columnarrayrows=8000 \
file=/u01/oratest/sbltest_large_data06.dbf \
multithreading=true direct=true parallel=true skip=10000000 load=2000000 &
sqlldr / control=load_sales.ctl bad=bad_7.dat streamsize=1048576 readsize=1048576 \
data=sales_data.dat log=load_sales_pll_7.log columnarrayrows=8000 \
file=/u01/oratest/sbltest_large_data07.dbf \
multithreading=true direct=true parallel=true skip=12000000 load=2000000 &
sqlldr / control=load_sales.ctl bad=bad_8.dat streamsize=1048576 readsize=1048576 \
data=sales_data.dat log=load_sales_pll_8.log  columnarrayrows=8000 \
file=/u01/oratest/sbltest_large_data08.dbf \
multithreading=true direct=true parallel=true skip=14000000 &
wait
date
exit 0

I did not split the data file. By using SKIP parameter I told the SQL*Loader process to skip certain number of records before it
writes the records into data blocks.
For example, the first loader process would insert only 2 Million Rows.
The 2nd loader process would skip the first 2 Million Rows and insert next 2 Million Rows.
The 3rd loader process would skip the first 4 Million Rows and insert only next 2 Million Rows, and so on…

Log: nohup_pll_1.out
Sat Nov 12 16:17:04 GMT 2005
Load completed - logical record count 2000000.
Load completed - logical record count 2000000.
Load completed - logical record count 336066.
Load completed - logical record count 2000000.
Load completed - logical record count 2000000.
Load completed - logical record count 2000000.
Load completed - logical record count 2000000.
Load completed - logical record count 2000000.
Sat Nov 12 16:17:42 GMT 2005       ----------- Total Run Time : 38 Seconds

!!!! Bingo. The total run time was 38 Seconds, out beat the “direct load method” run time 114 seconds.

The 8 log files run time is given below:

             Log -1        Log -2         Log – 3        Log -4         Log -5         Log -6         Log – 7         Log - 8
Run Time
In Seconds        26            27                29                 31                34                35                37                  29
 Tips:

1.        As a DBA/developer, you need to find out correct values for ROWS, READSIZE, STREAMSIZE, and
COLUMNARRAYROWS in your production environment. You need to do several tests to get the best performance.
2.        If the input data file arrives as a single file, then you can use “Parallel Direct Mode with Skip option”.
3.        If you get multiple data files, then use “Parallel Direct Mode with Split data file” approach. No need to run Unix Split
command.

Hope this helps ...

Tamilselvan G

4 comments:

  1. Great illustration and explanation!

    ReplyDelete
  2. That was really helpful but if we use parallel option indexes should be disabled is there a way not making the indexes un usable especially primary key

    ReplyDelete
  3. Its Great helping to us, thanks a lot your efforts :)

    ReplyDelete