Saturday, November 5, 2011

Clustering Factor

Ever since Oracle introduced Cost Based Optimizer, DBAs and developers tried to understand
how Oracle computes the “CLUSTERING FACTOR” metric when ever an index has been
analyzed. Here, I presented a small test case for computing the “CF”.
What is a “Clustering Factor”?
CF is a metric that tells Oracle how the rows in a table are randomly distributed with respect  to
index key values.

I emphasis  “with respect to index key values” words one more time because that’s how Oracle
computes the “CF” – lowest index key value to the highest index key value (when not using
parallel degree) which you will see later in this article.
Good CF Vs Bad CF  OR Good Index Vs Bad Index
Over the years, we have been told if the CF value is close to number of blocks used by table
rows, then it is a “Good”  index meaning the table rows are sorted with respect to the index keys.
On the other hand if the rows are distributed randomly, then it is a bad index, and you will
normally see a very high CF value that is very close the number of rows in the table. When the
optimizer computes execution plan for a SQL statement, it uses the CF value very much. On
many occasions  the optimizer would not choose index access path simply because the CF value
is too high. This is mainly due to bug in calculating CF value, rounding errors,

lack of understanding in CF by Oracle kernel developers, usage of FREELISTS, FREELIST
GROUPS,
table partition etc. In this paper I stick with only “CF” calculation.

I will not talk about performance issues.

First I created a test table in which I stored 50 rows randomly selected from dba_objects.
SQL> create table clf_test
pctfree 95 pctused 5
tablespace tamil_large_data
as select object_id, object_name, status from dba_objects
where rownum < 51
order by dbms_random.value  ;

Table created.

I used PCTFREE 95 so that atleast the rows are populated more than 1 block.
The next SQL verifies that rows are selected randomly.
Look at the OBJECT_ID column values.
SQL> select * from clf_test order by rowid ;

OBJECT_ID OBJECT_NAME                       STATUS                                                
--------- --------------------------------- -------                                               
48598 /1240abae_JdbcOperations          VALID                                                 
49148 /1240f5cf_LessExpression          VALID                                                 
53072 /10744837_ObjectStreamClass2      VALID                                                 
47880 /1225525_PlainTextInputStream     VALID                                                 
60864 /1261d15c_CompilerOptions         VALID                                                 
48658 /12579bb9_ResolutionDriverReso    VALID                                                 
58824 /10e48aa3_StringExpressionCons    VALID                                                 
59234 /1005bd30_LnkdConstant            VALID                                                 
56842 /11144507_CharConv12ByteBuilde    VALID                                                 
47666 /1236a7cd_SignatureFileBlock      VALID                                                 
50556 /10845320_TypeMapImpl             VALID                                                 

------ many lines are omitted --------------

61991 /10c906a0_ProfilePrinterErrors    VALID                                                 
53168 /112785cc_FVDCodeBaseImpl         VALID                                                 
51294 /1146b53d_BasicSliderUIScrollL    VALID                                                 
61373 /122e6c15_OracleSqljXADataSour    VALID                                                 
57102 /11799933_SchemaProtectionDoma    VALID                                                 
62683 /11ba27f1_CustomizerHarnessBea    VALID                                                 
49052 /10804ae7_Constants               VALID                                                 

50 rows selected.

SQL> analyze table clf_test compute statistics ;
Table analyzed.

SQL> select table_name, tablespace_name , blocks, num_rows
from user_tables where table_name = 'CLF_TEST' ;

TABLE_NAME   TABLESPACE_NAME          BLOCKS   NUM_ROWS                                            
------------ -------------------- ---------- ----------                                            
CLF_TEST     TAMIL_LARGE_DATA              7         50                                            

SQL> create index clf_test_idx on clf_test(object_id)
tablespace tamil_large_index compute statistics ;

Index created.

SQL> select index_name, tablespace_name, blevel, leaf_blocks,
distinct_keys, clustering_factor
from user_indexes where index_name = 'CLF_TEST_IDX' ;
INDEX_NAME   TABLESPACE_NAME                 BLEVEL LEAF_BLOCKS    DISTINCT_KEYS CLUSTERING_FACTOR        
------------             ---------------                                     ------         -----------                 -------------                -----------------        
CLF_TEST_IDX TAMIL_LARGE_INDEX                    0           1                           50                         45       
 

SQL> select object_id, dbms_rowid.rowid_block_number(rowid) blk_num
from clf_test   order by object_id;

Note CLF_MANUAL column is computed by me manually.

OBJECT_ID    BLK_NUM  CLF_MANUAL                                                        

---------- ----------  --------                                                        

47666      25611     1     
47736      25613     2      
47880      25610     3      
48402      25613     4    
48598      25610     5
48658      25610     5  -
Clustering Factor remains same when Previous BLOCK # is same
48866      25613     6   
49034      25613     6  -
Clustering Factor remains same when Previous BLOCK # is same
49052      25616     7   
49148      25610     8  
49258      25612     9
49542      25612     9  -
Clustering Factor remains same when Previous BLOCK # is same.    
49714      25614     10
50132      25612     11
50556      25611     12
51156      25612     13  
51294      25615     14
51480      25613     15  
52160      25612     16
52414      25612     16 -
Clustering Factor remains same when Previous BLOCK # is same.     
52462      25614     17
53072      25610     18
53168      25615     19
54186      25613     20  
55536      25612     21
56194      25611     22
56208      25615     23
56346      25614     24
56842      25611     25
57102      25615     26
57104      25611     27
57240      25614     28
58824      25610     29
58834      25614     30
59234      25610     31
59860      25613     32
60044      25615     33
60298      25612     34
60314      25614     35
60696      25611     36
60698      25614     37
60864      25610     38
60981      25613     39
61373      25615     40
61991      25615     40   -
Clustering Factor remains same when Previous BLOCK # is same62005      25611     41
62109      25614     42
62683      25616     43
63606      25611     44
64234      25615     45


My manual computation for “CLUSTERING FACTOR” value 45 is same as Oracle computed
value.

How Oracle gathers index statistics


First I used the “COMPUTE STATISTICS” along with CREATE INDEX command and enabled the
SQL trace. I didn’t see any useful information in the trace file.

If you use DBMS_STATS.gather_index_stats procedure to analyze index and also enable the
SQL trace, you will see the SQL statements for gathering statistics on index.

An example is given below for the index I created in my test case.
The tkprof output shows:
=====================PARSING IN CURSOR #14 len=347 dep=1 uid=31 oct=3 lid=31 tim=1111951510293891 hv=2407722251 ad='8755ccf0'
select /*+ cursor_sharing_exact
dynamic_sampling(0)
no_monitoring
no_expand
index(t,"CLF_TEST_IDX")
noparallel_index(t,"CLF_TEST_IDX") */
count(*) as nrw,
count(distinct sys_op_lbid(67975,'L',t.rowid)) as nlb,
count(distinct "OBJECT_ID") as ndk,
sys_op_countchg(substrb(t.rowid,1,15),1) as clf
from "TAMIL"."CLF_TEST"  t
where "OBJECT_ID" is not null

END OF STMT
PARSE#14:c=10000,e=1256,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1111951510293885
EXEC#14:c=0,e=54,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1111951510294037
FETCH#14:c=0,e=375,p=0,cr=1,cu=0,mis=0,r=1,dep=1,og=4,tim=1111951510294433
STAT #14 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT GROUP BY '
STAT #14 id=2 cnt=50 pid=1 pos=1 obj=67975 op='INDEX FULL SCAN CLF_TEST_IDX '

Oracle uses a function/procedure sys_op_countchg to calculate the “CLUSTERING FACTOR”.

The substr(rowid,1,15) is for object_id (6 bytes), file_id (3 bytes) and block number (6 bytes) and
they form a unique value. When gathering statistics on index without using parallel degree
option, Oracle reads all the index key values starting from the lowest leaf block to the last leaf
block and at the same time the sys_op_countchg function increments a counter whenever the
block number changes. It stores the previous block number.

(See previous page how I did it manually). 

Cheers, Tamil

No comments:

Post a Comment