Tamilselvan Govindasamy: Parallel DML

There is a myth that if the table is not partitioned, then parallel DML will not work. This is no
longer true. Several years ago I had updated 35 million rows table (size 40GB) using parallel
DML option in 8i. The parallel update ran 6X faster than procedural update using PL/SQL.

In this paper, you will learn how to update a BIG Table using parallel DML option.

All the tests given below are conducted in 10gR2.

First, I created a table, T1 using ALL_OBJECTS. The table has 10,000 rows.

One of the quickest way to find out the effectiveness (throughput) of parallelism in your system
is to run a query (SELECT) using parallel option, and measure the throughput by querying
V$PQ_TQSTAT table. (Note: V$PQ_TQSTAT will be renamed to V$PX_TQSTAT in 11g).
alter session set db_file_multiblock_read_count = 128;

select /*+ FULL(t1) parallel(t1,8) */ * from tamil.t1 ;

select process, num_rows, bytes from v$pq_tqstat order by process;

PROCESS      NUM_ROWS      BYTES
----------       ----------          ----------
P000             1300                664452
P001             1300                664551
P002             1300                664551
P003             1300                664551
P004             1200                613428
P005             1200                613428
P006             1200                613428
P007             1200                613428
QC               10000              5111817     -- Query coordinator
If you see the number of rows evenly distributed among the parallel slave processes, then it
indicates that your system is well configured for parallel processing.
Case # 1 - Simple update on NON-PARTITIONED Table
I have built a table, T1 that has 3.11 Million rows.

SQL> desc t1
Name                       Null?    Type
-------------------------- -------- ------------------
OWNER                       NOT NULL   VARCHAR2(30)
OBJECT_NAME           NOT NULL   VARCHAR2(30)
SUBOBJECT_NAME                        VARCHAR2(30)
OBJECT_ID                 NOT NULL NUMBER
DATA_OBJECT_ID                         NUMBER
OBJECT_TYPE                                VARCHAR2(19)
CREATED                    NOT NULL    DATE
LAST_DDL_TIME       NOT NULL    DATE
TIMESTAMP                                     VARCHAR2(19)
STATUS                                           VARCHAR2(7)
TEMPORARY                                   VARCHAR2(1)
GENERATED                                    VARCHAR2(1)
SECONDARY                                   VARCHAR2(1)

SQL> select count(*) from t1;

COUNT(*)
----------
3111872

SQL> select table_name, num_rows, blocks, empty_blocks,
        last_analyzed from user_tables
where table_name= 'T1' ;

TABLE_NAME     NUM_ROWS     BLOCKS    EMPTY_BLOCKS      LAST_ANALYZ
------------         ----------           ----------   ------------               -----------
T1                        3116183            43085            0                       05-MAR-2007

[I used DBMS_STATS.GATHER_TABLE_STATS procedure to gather table statistics with estimate
percent NULL(meaning 100 percent), but it reported more number of rows as you see in the
USER_TABLES].
Let me do the first test.

SQL> alter session set db_file_multiblock_read_count = 128;

Session altered.

Elapsed: 00:00:00.01

SQL> update tamil.t1 set object_name = lower(object_name);
3111872 rows updated.

Elapsed: 00:09:34.33
SQL> commit;

Commit complete.

Elapsed: 00:00:00.13

A single process ran for 9 Min and 34 seconds.
Case # 2 Parallel Update on NON-PARTITIONED TABLE
I have shutdown the instance and restared.

SQL> alter session enable parallel dml ;

Session altered.

Elapsed: 00:00:00.00

SQL> alter session force parallel dml parallel 8;

Session altered.

Elapsed: 00:00:00.01

SQL> alter session set db_file_multiblock_read_count = 128;

Session altered.

Elapsed: 00:00:00.00

SQL> update /*+ full(t1) parallel(t1,8) */
2         tamil.t1 set object_name = lower(object_name);
3111872 rows updated.

Elapsed: 00:00:28.03
SQL> commit;

Commit complete.

Elapsed: 00:00:01.12
Conclusion:
The parallel update took just 28 seconds versus 9 min 34 seconds in a single process update.
By setting highest value to DB_FILE_MULTIBLOCK_READ_COUNT parameter, "PARALLEL DML"
and "FORCE PARALLEL DML" with correct degree of parallelism at the session level, you can
update a big table in a shorter time.

Do not attempt to process in parallel on a small table. In fact, it will run longer than what a single
process will take.
Hope, this helps.
Tamilselvan G

Tamilselvan Govindasamy

Friday, November 25, 2011

Parallel DML

No comments:

Post a Comment