Bad clustering factor on >12.1 ? Have you tried Attribute Clustering ?
Posted by FatDBA on May 16, 2022
Recently we were facing one performance issue with one of the SQL which references a highly volatile table in the entire database, and one of the largest too. The tables goes through massive DMLs each minute and that has lead to huge randomness in the data distribution, and expectedly few of its critical indexes has a very high (bad) clustering factor (CF).
A quick explanation on the clustering factor, it is a measure of the ordered-ness of an index in comparison to the table that it is based on. That would be a simple explanation on Oracle’s clustering factor (CF), but if you want to read more about it I would recommend to read blog posts by Richard Foote, Jonathan Lewis and few other great articles by Randolf Geist.
Okay, coming back to the case – We tried few methods (fixed BLevel, Leaf_Blocks, degree, stats recollection etc.), but none of them helped much to stabilize performance of the query, and finally we tried something that was introduced in Oracle 12.1 the ‘Attribute Clustering‘ that helped us to resolve the problem. The attribute clustering improves physical IOs for tables and its partitions. An attribute-clustered table stores data in close proximity on disk in an ordered way based on the values of a certain set of columns in the table or a set of columns in the other tables.
Attribute clustering is a user-defined table directive that provides data clustering on one or more columns in a table. The directives can be specified when the table is created or modified. There are two types of attribute clustering:
- With Linear Ordering : Linear ordering stores the data according to the order of specified columns. This is the default type of clustering.
- With Interleaved Ordering : It accurately determines exactly where data is located on the disk. This enabled I/O Pruning. This uses a special multidimensional clustering technique based on Z-order curve fitting.
Note: Zone mapping is a separately licensed feature.
Let’s do some test to understand the feature better. Here in the demo I would be using linear order clustering without any zonemaps. You can try try pairing the LO clustering with zone maps too. All tests I am doing it on a 19.15.0.0.0 sandbox environment.
For testing purpose, I have created a new tables with some 2000000 random values/rows inserted into it, this is to mimic a bad clustering factor (CF) scenario. I have also created an Index on the top of a column (date) and have collected the statistics.
[oracle@oracleontario ~]$ !sql
sqlplus / as sysdba
SQL*Plus: Release 19.0.0.0.0 - Production on Wed Apr 27 00:13:08 2022
Version 19.15.0.0.0
Copyright (c) 1982, 2022, Oracle. All rights reserved.
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.15.0.0.0
SQL>
SQL> create table new_test (id number, DOB date, text varchar2(40));
Table created.
SQL> insert into new_test select rownum, sysdate-trunc(dbms_random.value(0, 20000)), 'PRASHANT DIXIT'
from dual connect by level <= 2000000; 2
2000000 rows created.
SQL>
SQL> commit;
Commit complete.
SQL> create index idx_newtest on new_test(dob);
Index created.
SQL> EXEC DBMS_STATS.gather_table_stats('DIXDROID', 'NEW_TEST', estimate_percent => null, method_opt=>'FOR ALL COLUMNS SIZE 1');
PL/SQL procedure successfully completed.
-- Here you can see the extremely bad CLUSTERING FACTOR of the Index.
SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor FROM user_tables t, user_indexes i
WHERE t.table_name = i.table_name AND i.index_name='IDX_NEWTEST'; 2
TABLE_NAME INDEX_NAME BLOCKS NUM_ROWS CLUSTERING_FACTOR
---------- -------------------- ---------- ---------- -----------------
NEW_TEST IDX_NEWTEST 10097 2000000 1989246
SQL>
With such a bad clustering factor of the Index, let me try to run a query on the same column and pass a date range and see the execution plan.
SQL> explain plan for select * from new_test where dob between '01-JUN-2017' and '30-JUN-2017';
Explained.
SQL>
SQL> SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------
Plan hash value: 1847055510
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3100 | 86800 | 2750 (1)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| NEW_TEST | 3100 | 86800 | 2750 (1)| 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("DOB">=TO_DATE(' 2017-06-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "DOB"<=TO_DATE(' 2017-06-30 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
15 rows selected.
With a bad clustering factor, the query is going for a full table scan on NEW_TEST table, and the overall cost of the access path is very high too. Let’s alter the table using Linear attribute clustering – I won’t be using materialized zonemaps here
SQL> alter table NEW_TEST add clustering by linear order(DOB) without materialized zonemap;
Table altered.
SQL> alter table NEW_TEST move online;
Table altered.
-- Now if you check you will the improved CF of the Index.
SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor FROM user_tables t, user_indexes i
WHERE t.table_name = i.table_name AND i.index_name='IDX_NEWTEST';
TABLE_NAME INDEX_NAME BLOCKS NUM_ROWS CLUSTERING_FACTOR
---------- -------------------- ---------- ---------- -----------------
NEW_TEST IDX_NEWTEST 10097 2000000 9277
The clustering factor of the Index was greatly improved after altering the table column (DOB) with linear order clustering enabled. Let’s try to run the same SQL and see if there is any improvements or not.
SQL> explain plan for select * from new_test where dob between '01-JUN-2017' and '30-JUN-2017';
Explained.
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
------------------------------------------------------
Plan hash value: 1446839462
---------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3100 | 86800 | 26 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| NEW_TEST | 3100 | 86800 | 26 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_NEWTEST | 3100 | | 11 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("DOB">=TO_DATE(' 2017-06-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"DOB"<=TO_DATE(' 2017-06-30 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
15 rows selected.
SQL>
SQL>
And we can see the FTS was replaced with a quick INDEX RANGE SCAN on IDX_NEWTEST index (created to cover DOB column). Look at the great reduction in the cost per step and final cost of the access path and plan tree.
Hope It Helped!
Prashant Dixit
Leave a Reply