Why the optimizer not picking the correct Index ? Bringing chaos to order ..

Posted by FatDBA on December 13, 2021

Hi Guys,

Would like to discuss one interesting problem that I’ve recently faced while supporting one of the customer migration, where one critical SQL statement turned out to be very slow on this new infrastructure. They moved from 12.1 standalone to 12.2 2-Node RAC cluster and moved to a brand new hardware, but with similar HW and resource configurations as earlier.

As per the team, one of the core application API was behaving very slow while processing requests, and also failing due to time limits applied within the application. Hence the entire system becoming slow after they moved the workload to this new system.

Query was using multiple subqueries or nesting results that it got from them using a UNION ALL operator which returned all rows as it does not eliminate duplicate selected rows. The query was frequently waiting on User IO wait class, specially on ‘direct path read‘ and ‘db file sequential read‘ events, and took ~ 2.7 minutes to complete, but was expected to complete in less than a second. The query text was something like below ..

(select * from (select * from dix_table_A where perfor_column_12=:1 and testcas_idnu=:2 ) where rownum <= :"SYS_B_0") 
union all (select * from (select * from dix_table_A where perfor_column_12=:3 and testcas_idnu=:4 ) where rownum <= :"SYS_B_1") 
union all (select * from (select * from dix_table_A where perfor_column_12=:5 and testcas_idnu=:6 ) where rownum <= :"SYS_B_2") 
union all (select * from (select * from dix_table_A where perfor_column_12=:7 and testcas_idnu=:8 ) where rownum <= :"SYS_B_3");


SQL_ID        PLAN_HASH_VALUE        EXECS    AVG_ETIME AVG_CPU_TIME        AVG_LIO      AVG_PIO
------------- --------------- ------------ ------------ ------------ -------------- ------------
8sn7dhnash901      891893112           45      162.276      101.390   19,618,917.8  11,911,560.6


--- Execution Plan of the ill/slow SQL
-------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |               |       |       |    24 (100)|          |
|   1 |  UNION-ALL                            |               |       |       |            |          |
|   2 |   COUNT STOPKEY                       |               |       |       |            |          |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| DIX_TABLE_A   |   187 | 17952 |     6   (0)| 00:00:03 |
|   4 |     INDEX SKIP SCAN                   | I2DIX_TABLE_A |   168K|       |     2   (0)| 00:00:03 |
|   5 |   COUNT STOPKEY                       |               |       |       |            |          |
|   6 |    TABLE ACCESS BY INDEX ROWID BATCHED| DIX_TABLE_A   |   187 | 17952 |     6   (0)| 00:00:03 |
|   7 |     INDEX SKIP SCAN                   | I2DIX_TABLE_A |   168K|       |     2   (0)| 00:00:03 |
|   8 |   COUNT STOPKEY                       |               |       |       |            |          |
|   9 |    TABLE ACCESS BY INDEX ROWID BATCHED| DIX_TABLE_A   |   187 | 17952 |     6   (0)| 00:00:03 |
|  10 |     INDEX SKIP SCAN                   | I2DIX_TABLE_A |   168K|       |     2   (0)| 00:00:03 |
|  11 |   COUNT STOPKEY                       |               |       |       |            |          |
|  12 |    TABLE ACCESS BY INDEX ROWID BATCHED| DIX_TABLE_A   |   187 | 17952 |     6   (0)| 00:00:03 |
|  13 |     INDEX SKIP SCAN                   | I2DIX_TABLE_A |   168K|       |     2   (0)| 00:00:03 |
-------------------------------------------------------------------------------------------------------

As per the SQL text, its querying PERFOR_COLUMN_12 and TESTCASE_IDNU columns in the WHERE clause, and per the execution plan (above), I2DIX_TABLE_A index is picked to prepare the plan with INDEX SKIP SCAN as the access method/path for said Index. Index skip scan means, that the leading or other columns of the index is ignored. This costs performance since Oracle has read every item of the first column, second or subsequent columns and check if the second (or third etc.) column is what you searched for. This usually is faster than a full-table scan (depends on your query), but slower than a index range scan.

I2DIX_TABLE_A index is a multi column index which has both of the referenced columns PERFOR_COLUMN_12 and TESTCASE_IDNU, but in the reverse direction, not as the leading columns.

-- I2DIX_TABLE_A Index DDL (BAD Index)
CREATE UNIQUE INDEX "DIXIT"."I2DIX_TABLE_A" ON "DIXIT"."DIX_TABLE_A" ("PRODUCT_CLASSIC_CAT", "BUILD_CLASSIC_ID", "TESTCASE_IDNU", "PERFOR_COLUMN_12");

While doing further analysis, I saw there is already one dedicated index I1DIX_TABLE_A there to cover this combination of columns PERFOR_COLUMN_12 and TESTCASE_IDNU, and in the right order too, but was ignored by the optimizer while preparing the estimations or costing.

-- I1DIX_TABLE_A Index DDL (GOOD Index)
CREATE INDEX "DIXIT"."I1DIX_TABLE_A" ON "DIXIT"."DIX_TABLE_A" ("PERFOR_COLUMN_12", "TESTCASE_IDNU");

Now, this is strange, why the CBO didn’t considered that index even when the required column set is present in the other index and in the right order, and instead it opted to go for an expensive index ? These are the other stats about both the two indexes.

INDEX_NAME           UNIQUENES     BLEVEL DEGREE   TABLE_NAME      NUM_ROWS    LEAF_BLOCKS  INI_TRANS  MAX_TRANS STATUS  LAST_ANAL
-------------------- --------- ---------- -------- --------------- ----------- ------------ ---------- --------- ------  ------------
I1DIX_TABLE_A        NONUNIQUE          4 1        DIX_TABLE_A     33457571    942391        2         255       VALID    01-DEC-21
I2DIX_TABLE_A        UNIQUE             3 1        DIX_TABLE_A     35084294    494579        2         255       VALID    01-DEC-21

All statistics looks good, I mean the row count is almost same, ITL initial and max values, status and stats collection date, but the difference is there for the branch level (BLevel) and Leaf_Blocks as they are different. Taking about the Blevel, it’s the part of the B-tree index that relates to the number of times Oracle has to narrow its search on the index while searching for a particular record, or the blevel is the number of branch levels. On the other hand, leaf_blocks represents number of leaf blocks in an index.

As we know the Blevel and Clustering Factor are important elements of an index scan cost. The index that has lower values for these is likely to be chosen. I2DIX_TABLE_A has lower values than I1DIX_TABLE_A. About the CF, It is hard to decrease clustering factor because it requires rebuilding the table.

So, lets focus to decrease the BLEVEL, will try rebuilding I1DIX_TABLE_A as that might help to decrease the Blevel.

SQL> alter index dixit.I1DIX_TABLE_A rebuild;

Index altered.


-- Index stats after rebuilding 
INDEX_NAME           UNIQUENES     BLEVEL DEGREE   TABLE_NAME      NUM_ROWS    LEAF_BLOCKS  INI_TRANS  MAX_TRANS STATUS  LAST_ANAL
-------------------- --------- ---------- -------- --------------- ----------- ------------ ---------- --------- ------  ------------
I1DIX_TABLE_A        NONUNIQUE          3 1        DIX_TABLE_A     33457571    446703        2         255       VALID    01-DEC-21
I2DIX_TABLE_A        UNIQUE             3 1        DIX_TABLE_A     35084294    494579        2         255       VALID    01-DEC-21

And Yes, both the BLEVEL and the LEAF_BLOCKS of I1DIX_TABLE_A got reset, and I’ve immediately started hearing some good news from the application team about query’s runtime improvements.
The expensive index I2DIX_TABLE_A was finally replaced by the good I1DIX_TABLE_A and that’s how the INDEX SKIP SCAN was replaced with the fast INDEX RANGE SCAN.

-------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |               |     8 |   10M |     4   (0)| 00:00:01 |
|   1 |  UNION-ALL                            |               |       |       |            |          |
|*  2 |   COUNT STOPKEY                       |               |       |       |            |          |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| DIX_TABLE_A   |     2 |   192 |     1   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN                  | I1DIX_TABLE_A | 28707 |       |     1   (0)| 00:00:01 |
|*  5 |   COUNT STOPKEY                       |               |       |       |            |          |
|   6 |    TABLE ACCESS BY INDEX ROWID BATCHED| DIX_TABLE_A   |     2 |   192 |     1   (0)| 00:00:01 |
|*  7 |     INDEX RANGE SCAN                  | I1DIX_TABLE_A | 28707 |       |     1   (0)| 00:00:01 |
|*  8 |   COUNT STOPKEY                       |               |       |       |            |          |
|   9 |    TABLE ACCESS BY INDEX ROWID BATCHED| DIX_TABLE_A   |     2 |   192 |     1   (0)| 00:00:01 |
|* 10 |     INDEX RANGE SCAN                  | I1DIX_TABLE_A | 28707 |       |     1   (0)| 00:00:01 |
|* 11 |   COUNT STOPKEY                       |               |       |       |            |          |
|  12 |    TABLE ACCESS BY INDEX ROWID BATCHED| DIX_TABLE_A   |     2 |   192 |     1   (0)| 00:00:01 |
|* 13 |     INDEX RANGE SCAN                  | I1DIX_TABLE_A | 28707 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------

You might be thinking that the other difference is the UNIQUENESS, though the Unique scan cost is lower than Range scan cost. But, in this case, unique scan cannot be used for any index. Unique index is NOT always used for “unique scan.” It depends on predicates. The bad index is chosen not because it is a unique index but its scan cost is lower.

Hope It Helped
Prashant Dixit

This entry was posted on December 13, 2021 at 2:33 PM and is filed under Advanced, troubleshooting. Tagged: Database, oracle, performance, troubleshooting. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

4 Responses to “Why the optimizer not picking the correct Index ? Bringing chaos to order ..”

Vijesh said

December 25, 2021 at 10:53 AM
This is great. Thank you for sharing.

Reply
- FatDBA said
  
  December 26, 2021 at 9:36 AM
  You’re welcome my friend!
  
  Reply
nilesh zodape said

January 28, 2022 at 4:05 PM
Thanks for sharing

Reply
- FatDBA said
  
  January 29, 2022 at 8:58 AM
  Thanks Nilesh!
  
  Reply

Tales From A Lazy Fat DBA

Its all about Databases, their performance, troubleshooting & much more …. ¯\_(ツ)_/¯

Likes