Wednesday, February 22, 2023

Check the performance of Index Skip Scan

=> If an index exists on the column used in the WHERE clause, the optimizer will use the index, potentially using Index Range Scan, depending on the selectivity of the index. Even if the index is a concatenated index, where the first column in the index is the column in the WHERE clause, it will still be used by the optimizer.

If there is no index on the column, the optimizer will choose Full Table Scan, which means it will scan the entire table, which can be slow and resource-intensive.

If the column is present in a concatenated index, but it is not the first column, the optimizer will choose "Index Skip Scan." This means that the index is subdivided into multiple indexes based on the distinct values of the first column. If there are fewer distinct values, then there will be fewer sub-indexes, and the Index Skip Scan will be fast. But if there are more distinct values in the first column, there will be more sub-indexes, and the Index Skip Scan will be slower but still better than Full Table Scan.

If the "Cost" difference between Full Table Scan and Index Skip Scan is relatively low, it may be worth creating an index for the column or columns used in the WHERE clause and verifying the cost with the newly created index.


Here is the demo 



SQL> EXPLAIN PLAN FOR SELECT INITCAP(COLUMN_6) COLUMN_NAME FROM XXXX_EMPL_ADDRESS WHERE 1 = 1 AND ADDRESS_TYPE IN ( 'Village' , 'Town') AND ACC_DET_ID = (SELECT DISTINCT ACC_DET_ID FROM XXXX_EMPL_ADDR_DET WHERE 1 = 1 AND ACCT_NUM = 123456 AND EMPL_ID = 12345 ) ORDER BY COLUMN_6 DESC NULLS LAST;

Explained.

SQL> SET LINESIZE 130
SQL> SET PAGESIZE 0
SQL> select * from table(DBMS_XPLAN.DISPLAY());
Plan hash value: 728213998

----------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                   |     1 |    17 |   216   (1)| 00:00:01 |
|   1 |  SORT ORDER BY                       |                   |     1 |    17 |   216   (1)| 00:00:01 |
|*  2 |   TABLE ACCESS BY INDEX ROWID BATCHED| XXXX_EMPL_ADDRESS |     1 |    17 |   212   (0)| 00:00:01 |
|*  3 |    INDEX SKIP SCAN                   | XXXX_CA_IDX_02    |     1 |       |   211   (0)| 00:00:01 |
|   4 |     SORT UNIQUE NOSORT               |                   |     1 |    18 |     3  (34)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                | XXXX_CAD_IDX_01   |     1 |    18 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("ADDRESS_TYPE"='Town' OR "ADDRESS_TYPE"='Village')
   3 - access("ACC_DET_ID"= (SELECT DISTINCT "ACC_DET_ID" FROM "XXXX"."XXXX_EMPL_ADDR_DET"
              "XXXX_EMPL_ADDR_DET" WHERE "EMPL_ID"=12345 AND TO_NUMBER("ACCT_NUM")=123456))
       filter("ACC_DET_ID"= (SELECT DISTINCT "ACC_DET_ID" FROM "XXXX"."XXXX_EMPL_ADDR_DET"
              "XXXX_EMPL_ADDR_DET" WHERE "EMPL_ID"=12345 AND TO_NUMBER("ACCT_NUM")=123456))
   5 - access("EMPL_ID"=12345)
       filter(TO_NUMBER("ACCT_NUM")=123456)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - 1 Sql Plan Directive used for this statement

28 rows selected.

SQL>



In the above example Index Skip Scan operation is used. Here two tables are queried to fetch the required data. 

XXXX_EMPL_ADDRESS 

XXXX_EMPL_ADDR_DET 

There is no problem accessing XXXX_EMPL_ADDR_DET because the columns ACCT_NUM and EMPL_ID has direct index hit (XXXX_CAD_IDX_01) and Index Range Scan is performed and the cost is also less for this operation (Cost = 2).

In the table XXXX_EMPL_ADDRESS, two columns are queried ADDRESS_TYPE and ACC_DET_ID. Out of which only one column ACC_DET_ID is indexed in a concatenated index XXXX_CA_IDX_02 and it is its second column. So Index Skip Scan Operation is chosen


=> The overall cost of this statement is 216. Now check the cost with Full Table Scan Operation on XXXX_EMPL_ADDRESS table. Cost here is 297. There is only 30 % increase in performance due to Index Skip Scan. This is because the number of distinct rows on index first column is high and close to total number of rows in the table. Leading to multiple sub indexes to be created by Index Skip Scan consuming more time and resources. 




SQL> EXPLAIN PLAN FOR SELECT /*+ FULL(XXXX_EMPL_ADDRESS) */ INITCAP(COLUMN_6) COLUMN_NAME FROM XXXX_EMPL_ADDRESS WHERE 1 = 1 AND ADDRESS_TYPE IN ( 'Village' , 'Town') AND ACC_DET_ID = (SELECT DISTINCT ACC_DET_ID FROM XXXX_EMPL_ADDR_DET WHERE 1 = 1 AND ACCT_NUM = 123456 AND EMPL_ID = 12345 ) ORDER BY COLUMN_6 DESC NULLS LAST;

Explained.

SQL> SET LINESIZE 130
SQL> SET PAGESIZE 0
SQL> select * from table(DBMS_XPLAN.DISPLAY());
Plan hash value: 647256468

------------------------------------------------------------------------------------------
| Id  | Operation            | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                   |     1 |    17 |   297   (2)| 00:00:01 |
|   1 |  SORT ORDER BY       |                   |     1 |    17 |   297   (2)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL  | XXXX_EMPL_ADDRESS |     1 |    17 |   293   (2)| 00:00:01 |
|   3 |    SORT UNIQUE NOSORT|                   |     1 |    18 |     3  (34)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN | XXXX_CAD_IDX_01   |     1 |    18 |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("ADDRESS_TYPE"='Town' OR "ADDRESS_TYPE"='Village') AND
              "ACC_DET_ID"= (SELECT DISTINCT "ACC_DET_ID" FROM "XXXX"."XXXX_EMPL_ADDR_DET"
              "XXXX_EMPL_ADDR_DET" WHERE "EMPL_ID"=12345 AND TO_NUMBER("ACCT_NUM")=123456))
   4 - access("EMPL_ID"=12345)
       filter(TO_NUMBER("ACCT_NUM")=123456)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - 1 Sql Plan Directive used for this statement

25 rows selected.

SQL>



SQL> SELECT COUNT(DISTINCT EMPL_id) FROM XXXX.XXXX_EMPL_ADDRESS;

COUNT(DISTINCTEMPL_ID)
----------------------
                 33579

SQL> select count(*) from XXXX.XXXX_EMPL_ADDRESS;

  COUNT(*)
----------
     54782

SQL>
    


=> Now create a concatenated index on the two columns used in the XXXX_EMPL_ADDRESS and verify the cost




SQL> create index XXXX.EMPL_IDX_1234 on XXXX.XXXX_EMPL_ADDRESS("ACC_DET_ID","ADDRESS_TYPE");

Index created.

SQL>


SQL> EXPLAIN PLAN FOR SELECT INITCAP(COLUMN_6) COLUMN_NAME FROM XXXX_EMPL_ADDRESS WHERE 1 = 1 AND ADDRESS_TYPE IN ( 'Village' , 'Town') AND ACC_DET_ID = (SELECT DISTINCT ACC_DET_ID FROM XXXX_EMPL_ADDR_DET WHERE 1 = 1 AND ACCT_NUM = 123456 AND EMPL_ID = 12345 ) ORDER BY COLUMN_6 DESC NULLS LAST;

Explained.

SQL> SET LINESIZE 130
SQL> SET PAGESIZE 0
SQL> select * from table(DBMS_XPLAN.DISPLAY());
Plan hash value: 144787732

----------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                   |     1 |    17 |     4  (25)| 00:00:01 |
|   1 |  SORT ORDER BY                       |                   |     1 |    17 |     4  (25)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| XXXX_EMPL_ADDRESS |     1 |    17 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | EMPL_IDX_1234     |     1 |       |     2   (0)| 00:00:01 |
|   4 |     SORT UNIQUE NOSORT               |                   |     1 |    18 |     3  (34)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN                | XXXX_CAD_IDX_01   |     1 |    18 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("ACC_DET_ID"= (SELECT DISTINCT "ACC_DET_ID" FROM "XXXX"."XXXX_EMPL_ADDR_DET"
              "XXXX_EMPL_ADDR_DET" WHERE "EMPL_ID"=12345 AND TO_NUMBER("ACCT_NUM")=123456))
       filter("ADDRESS_TYPE"='Town' OR "ADDRESS_TYPE"='Village')
   5 - access("EMPL_ID"=12345)
       filter(TO_NUMBER("ACCT_NUM")=123456)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - 2 Sql Plan Directives used for this statement

26 rows selected.

SQL>
    


=> Cost has come down to 4 from 216 (Index Skip Scan) increasing the performance by 99%






No comments:

Post a Comment