Monday, November 27, 2023

Bitmap Index Use Case

Bitmap indexes are designed to efficiently handle columns with low cardinality, where the number of distinct values is relatively small.

Instead of storing a list of row identifiers for each distinct value, a Bitmap Index uses a bitmap, a binary representation where each bit corresponds to a row in the table. If a row has a particular value, the corresponding bit is set to 1; otherwise, it is set to 0.

Below use case shows the benefit of bitmap index over traditional b-tree index 

Consider a table orders with 100000 rows and with a column status with only two distinct values - "Pending" and "Shipped" 

Queries filtering data based on Status column will benefit from bitmap indexes over b-tree indexes as demonstrated below 

=> Create table orders 

SQL> set echo on;
SQL> @create-table.sql
SQL> CREATE TABLE orders (
  2      order_id NUMBER PRIMARY KEY,
  3      customer_id NUMBER,
  4      order_date DATE,
  5      product_id NUMBER,
  6      quantity NUMBER,
  7      price NUMBER,
  8      total_amount NUMBER,
  9      status VARCHAR2(20),
 10      shipping_address VARCHAR2(100),
 11      payment_status VARCHAR2(20)
 12  );

Table created.

SQL>

=> insert 100000 rows in orders table, notice that status column is populated only with two distinct values - "Pending" and "Shipped"

SQL> @insert-data.sql
SQL> INSERT INTO orders (order_id, customer_id, order_date, product_id, quantity, price, total_amount, status, shipping_address, payment_status)
  2  SELECT
  3      level,
  4      mod(level, 10) + 1,
  5      SYSDATE - level,
  6      mod(level, 5) + 1,
  7      mod(level, 3) + 1,
  8      50 * (mod(level, 5) + 1),
  9      50 * (mod(level, 5) + 1) * (mod(level, 3) + 1),
 10      CASE WHEN mod(level, 2) = 0 THEN 'Pending' ELSE 'Shipped' END,
 11      'Address ' || level,
 12      CASE WHEN mod(level, 2) = 0 THEN 'Unpaid' ELSE 'Paid' END
 13  FROM dual
 14  CONNECT BY level <= 100000;

100000 rows created.

SQL> commit;

Commit complete.

SQL>


=> Verify the plan for the statement without using any indexes. Notice that the cost is 2119 with Full Table Scan 

SQL> set serveroutput on;
SQL> EXPLAIN PLAN FOR SELECT * FROM orders WHERE status = 'Pending';

Explained.

SQL> SET LINESIZE 130
SET PAGESIZE 0
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);SQL> SQL>
Plan hash value: 1275100350

----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        | 80022 |    12M|  2119   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| ORDERS | 80022 |    12M|  2119   (1)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("STATUS"='Pending')

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

17 rows selected.

SQL>


=> Verify the plan with BITMAP Index. Notice that the cost has dropped to 404

SQL> CREATE BITMAP INDEX idx_status_bitmap ON orders(status);

Index created.

SQL> EXPLAIN PLAN FOR SELECT * FROM orders WHERE status = 'Pending';

Explained.

SQL> SET LINESIZE 130
SET PAGESIZE 0
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);SQL> SQL>
Plan hash value: 3935828245

---------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                   | 80022 |    12M|   404   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| ORDERS            | 80022 |    12M|   404   (0)| 00:00:01 |
|   2 |   BITMAP CONVERSION TO ROWIDS       |                   |       |       |            |          |
|*  3 |    BITMAP INDEX SINGLE VALUE        | IDX_STATUS_BITMAP |       |       |            |          |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("STATUS"='Pending')

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

19 rows selected.

SQL>


=> Verify the plan with B-TREE Index. Notice that the cost is 995, which is better than Full Table Scan, but still higher than Bitmap index cost.

SQL> drop index idx_status_bitmap;

Index dropped.

SQL> CREATE INDEX idx_status_btree ON orders(status);

Index created.

SQL> EXPLAIN PLAN FOR SELECT * FROM orders WHERE status = 'Pending';

Explained.

SQL> SET LINESIZE 130
SET PAGESIZE 0
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);SQL> SQL>
Plan hash value: 3186639735

--------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                  | 80022 |    12M|   995   (1)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| ORDERS           | 80022 |    12M|   995   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | IDX_STATUS_BTREE | 80022 |       |   141   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("STATUS"='Pending')

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

18 rows selected.

SQL> exit



Additional points on Bitmap Indexes : 

Bitmap Indexes are space-efficient, especially when dealing with columns that have a small number of distinct values. The index size is proportional to the number of distinct values, not the total number of rows.

While Bitmap Indexes offer advantages in read-heavy scenarios, they may not be the best choice for columns with high volatility or frequent updates, as updates can result in the need to rebuild the entire index.

Due to their characteristics, Bitmap Indexes are commonly employed in data warehousing environments where analytical queries are prevalent, and the data is relatively stable.

No comments:

Post a Comment