Clustering Factor
Ever since Oracle introduced Cost Based Optimizer, DBAs and developers tried to understand
how Oracle computes the “CLUSTERING FACTOR” metric when ever an index has been
analyzed. Here, I presented a small test case for computing the “CF”.
What is a “Clustering Factor”?
CF is a metric that tells Oracle how the rows in a table are randomly distributed with respect to
index key values.
I emphasis “with respect to index key values” words one more time because that’s how Oracle
computes the “CF” – lowest index key value to the highest index key value (when not using
parallel degree) which you will see later in this article.
Good CF Vs Bad CF OR Good Index Vs Bad Index
Over the years, we have been told if the CF value is close to number of blocks used by table
rows, then it is a “Good” index meaning the table rows are sorted with respect to the index keys.
On the other hand if the rows are distributed randomly, then it is a bad index, and you will
normally see a very high CF value that is very close the number of rows in the table. When the
optimizer computes execution plan for a SQL statement, it uses the CF value very much. On
many occasions the optimizer would not choose index access path simply because the CF value
is too high. This is mainly due to bug in calculating CF value, rounding errors,
lack of understanding in CF by Oracle kernel developers, usage of FREELISTS, FREELIST
GROUPS,
table partition etc. In this paper I stick with only “CF” calculation.
I will not talk about performance issues.
First I created a test table in which I stored 50 rows randomly selected from dba_objects.
SQL> create table clf_test
pctfree 95 pctused 5
tablespace tamil_large_data
as select object_id, object_name, status from dba_objects
where rownum < 51
order by dbms_random.value ;
Table created.
I used PCTFREE 95 so that atleast the rows are populated more than 1 block.
The next SQL verifies that rows are selected randomly.
Look at the OBJECT_ID column values.
SQL> select * from clf_test order by rowid ;
OBJECT_ID OBJECT_NAME STATUS
--------- --------------------------------- -------
48598 /1240abae_JdbcOperations VALID
49148 /1240f5cf_LessExpression VALID
53072 /10744837_ObjectStreamClass2 VALID
47880 /1225525_PlainTextInputStream VALID
60864 /1261d15c_CompilerOptions VALID
48658 /12579bb9_ResolutionDriverReso VALID
58824 /10e48aa3_StringExpressionCons VALID
59234 /1005bd30_LnkdConstant VALID
56842 /11144507_CharConv12ByteBuilde VALID
47666 /1236a7cd_SignatureFileBlock VALID
50556 /10845320_TypeMapImpl VALID
------ many lines are omitted --------------
61991 /10c906a0_ProfilePrinterErrors VALID
53168 /112785cc_FVDCodeBaseImpl VALID
51294 /1146b53d_BasicSliderUIScrollL VALID
61373 /122e6c15_OracleSqljXADataSour VALID
57102 /11799933_SchemaProtectionDoma VALID
62683 /11ba27f1_CustomizerHarnessBea VALID
49052 /10804ae7_Constants VALID
50 rows selected.
SQL> analyze table clf_test compute statistics ;
Table analyzed.
SQL> select table_name, tablespace_name , blocks, num_rows
from user_tables where table_name = 'CLF_TEST' ;
TABLE_NAME TABLESPACE_NAME BLOCKS NUM_ROWS
------------ -------------------- ---------- ----------
CLF_TEST TAMIL_LARGE_DATA 7 50
SQL> create index clf_test_idx on clf_test(object_id)
tablespace tamil_large_index compute statistics ;
Index created.
SQL> select index_name, tablespace_name, blevel, leaf_blocks,
distinct_keys, clustering_factor
from user_indexes where index_name = 'CLF_TEST_IDX' ;
INDEX_NAME TABLESPACE_NAME BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR
------------ --------------- ------ ----------- ------------- -----------------
CLF_TEST_IDX TAMIL_LARGE_INDEX 0 1 50 45
SQL> select object_id, dbms_rowid.rowid_block_number(rowid) blk_num
from clf_test order by object_id;
Note CLF_MANUAL column is computed by me manually.
OBJECT_ID BLK_NUM CLF_MANUAL
---------- ---------- --------
47666 25611 1
47736 25613 2
47880 25610 3
48402 25613 4
48598 25610 5
48658 25610 5 - Clustering Factor remains same when Previous BLOCK # is same
48866 25613 6
49034 25613 6 - Clustering Factor remains same when Previous BLOCK # is same
49052 25616 7
49148 25610 8
49258 25612 9
49542 25612 9 - Clustering Factor remains same when Previous BLOCK # is same.
49714 25614 10
50132 25612 11
50556 25611 12
51156 25612 13
51294 25615 14
51480 25613 15
52160 25612 16
52414 25612 16 - Clustering Factor remains same when Previous BLOCK # is same.
52462 25614 17
53072 25610 18
53168 25615 19
54186 25613 20
55536 25612 21
56194 25611 22
56208 25615 23
56346 25614 24
56842 25611 25
57102 25615 26
57104 25611 27
57240 25614 28
58824 25610 29
58834 25614 30
59234 25610 31
59860 25613 32
60044 25615 33
60298 25612 34
60314 25614 35
60696 25611 36
60698 25614 37
60864 25610 38
60981 25613 39
61373 25615 40
61991 25615 40 - Clustering Factor remains same when Previous BLOCK # is same62005 25611 41
62109 25614 42
62683 25616 43
63606 25611 44
64234 25615 45
My manual computation for “CLUSTERING FACTOR” value 45 is same as Oracle computed
value.
How Oracle gathers index statistics
First I used the “COMPUTE STATISTICS” along with CREATE INDEX command and enabled the
SQL trace. I didn’t see any useful information in the trace file.
If you use DBMS_STATS.gather_index_stats procedure to analyze index and also enable the
SQL trace, you will see the SQL statements for gathering statistics on index.
An example is given below for the index I created in my test case.
The tkprof output shows:=====================PARSING IN CURSOR #14 len=347 dep=1 uid=31 oct=3 lid=31 tim=1111951510293891 hv=2407722251 ad='8755ccf0'
select /*+ cursor_sharing_exact
dynamic_sampling(0)
no_monitoring
no_expand
index(t,"CLF_TEST_IDX")
noparallel_index(t,"CLF_TEST_IDX") */
count(*) as nrw,
count(distinct sys_op_lbid(67975,'L',t.rowid)) as nlb,
count(distinct "OBJECT_ID") as ndk,
sys_op_countchg(substrb(t.rowid,1,15),1) as clf
from "TAMIL"."CLF_TEST" t
where "OBJECT_ID" is not null
END OF STMT
PARSE#14:c=10000,e=1256,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1111951510293885
EXEC#14:c=0,e=54,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=1111951510294037
FETCH#14:c=0,e=375,p=0,cr=1,cu=0,mis=0,r=1,dep=1,og=4,tim=1111951510294433
STAT #14 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT GROUP BY '
STAT #14 id=2 cnt=50 pid=1 pos=1 obj=67975 op='INDEX FULL SCAN CLF_TEST_IDX '
Oracle uses a function/procedure sys_op_countchg to calculate the “CLUSTERING FACTOR”.
The substr(rowid,1,15) is for object_id (6 bytes), file_id (3 bytes) and block number (6 bytes) and
they form a unique value. When gathering statistics on index without using parallel degree
option, Oracle reads all the index key values starting from the lowest leaf block to the last leaf
block and at the same time the sys_op_countchg function increments a counter whenever the
block number changes. It stores the previous block number.
(See previous page how I did it manually).
Cheers, Tamil
No comments:
Post a Comment