为什么我使用聚合函数获取覆盖查询的索引扫描?

时间:2011-02-01 16:44:10

标签: database performance oracle indexing

我有一个问题:

select min(timestamp) from table

这个表有60多万行,每天我删除一些。要确定是否有足够的数据删除我运行上面的查询。时间戳升序有一个索引,只包含一列,而oracle中的查询计划会使其成为完整的索引扫描。这不应该是寻求的定义吗?

编辑包括计划:

| Id  | Operation                  | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
|   2 |   INDEX FULL SCAN (MIN/MAX)| NEVENTS_I2 |     1 |     8 |     4   (100)| 00:00:01 |
|   1 |  SORT AGGREGATE            |            |     1 |     8 |            |          |
|   0 | SELECT STATEMENT           |            |     1 |     8 |     4   (0)| 00:00:01 |

3 个答案:

答案 0 :(得分:4)

您可以发布实际的查询计划吗?你确定它没有进行最小/最大索引全扫描吗?正如您在此示例中所看到的,我们使用最小/最大索引完全扫描从100,000行表中获取MIN值,只有少量一致的获取。

SQL> create table foo (
  2    col1 date not null
  3  );

Table created.

SQL> insert into foo
  2    select sysdate + level
  3      from dual
  4   connect by level <= 100000;

100000 rows created.

SQL> create index idx_foo_col1
  2      on foo( col1 );

Index created.

SQL> analyze table foo compute statistics for all indexed columns;

Table analyzed.

SQL> set autotrace on;

<<Note that I ran this statement once just to get the delayed block cleanout to 
  happen so that the consistent gets number wouldn't be skewed.  You could run a
  different query as well>>

  1* select min(col1) from foo
SQL> /

MIN(COL1)
---------
02-FEB-11


Execution Plan
----------------------------------------------------------
Plan hash value: 817909383

--------------------------------------------------------------------------------

-----------

| Id  | Operation                  | Name         | Rows  | Bytes | Cost (%CPU)|

 Time     |

--------------------------------------------------------------------------------

-----------

|   0 | SELECT STATEMENT           |              |     1 |     7 |     2   (0)|

 00:00:01 |

|   1 |  SORT AGGREGATE            |              |     1 |     7 |            |

          |

|   2 |   INDEX FULL SCAN (MIN/MAX)| IDX_FOO_COL1 |     1 |     7 |     2   (0)|

 00:00:01 |

--------------------------------------------------------------------------------

-----------


Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads
          0  redo size
        532  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

答案 1 :(得分:3)

首先,我认为只有在列声明为NOT NULL时才会使用索引。我测试了以下设置:

SQL> CREATE TABLE my_table (ts TIMESTAMP);

Table created

SQL> INSERT INTO my_table
  2  SELECT systimestamp + ROWNUM * INTERVAL '1' SECOND 
  3    FROM dual CONNECT BY LEVEL <= 100000;

100000 rows inserted

SQL> CREATE INDEX ix ON my_table(ts);

Index created

SQL> EXPLAIN PLAN FOR SELECT MIN(ts) FROM my_table;

Explained

SQL> SELECT * FROM TABLE(dbms_xplan.display);

--------------------------------------------------------------------------------
| Id  | Operation                  | Name | Rows  | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |      |     1 |    13 |    69   (2)| 00:00:0
|   1 |  SORT AGGREGATE            |      |     1 |    13 |            |
|   2 |   INDEX FULL SCAN (MIN/MAX)| IX   | 90958 |  1154K|            |
--------------------------------------------------------------------------------

这里我们注意到使用了索引,但是读取了索引中的所有行。如果我们指定列不为null,我们会得到一个更好的计划:

SQL> ALTER TABLE my_table MODIFY ts NOT NULL;

Table altered

SQL> EXPLAIN PLAN FOR SELECT MIN(ts) FROM my_table;

Explained

SQL> SELECT * FROM TABLE(dbms_xplan.display);

--------------------------------------------------------------------------------
| Id  | Operation                  | Name | Rows  | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |      |     1 |    13 |     2   (0)| 00:00:0
|   1 |  SORT AGGREGATE            |      |     1 |    13 |            |
|   2 |   INDEX FULL SCAN (MIN/MAX)| IX   | 90958 |  1154K|     2   (0)| 00:00:0
--------------------------------------------------------------------------------

实际上,如果我们添加一个WHERE子句(Oracle将从索引中读取一行),这也是同样的计划:

SQL> EXPLAIN PLAN FOR SELECT MIN(ts) FROM my_table WHERE ts IS NOT NULL;

Explained

SQL> SELECT * FROM TABLE(dbms_xplan.display);

--------------------------------------------------------------------------------
| Id  | Operation                   | Name | Rows  | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |      |     1 |    13 |     2   (0)| 00:00:
|   1 |  SORT AGGREGATE             |      |     1 |    13 |            |
|   2 |   FIRST ROW                 |      | 90958 |  1154K|     2   (0)| 00:00:
|   3 |    INDEX FULL SCAN (MIN/MAX)| IX   | 90958 |  1154K|     2   (0)| 00:00:
--------------------------------------------------------------------------------

最后一个计划显示(第2行)Oracle确实正在执行“搜索”。

答案 2 :(得分:1)

只是想磨练一下“INDEX FULL SCAN(MIN / MAX)”与“INDEX FULL SCAN”完全不同的事实。 INDEX FULL SCAN确实扫描整个索引(可能使用过滤)。但是,INDEX FULL SCAN(MIN / MAX)或INDEX RANGE SCAN(MIN / MAX)仅获取最小或最大的叶块(来自范围),但只能在列为非NULL时使用(这是一个有点傻,真的是一个bug,因为根据定义,NULL值既不是最小值也不是最大值)。 (MIN / MAX)优化是隐式FIRST_ROWS操作,并且不需要“WHERE ... IS NOT NULL”查询条件来执行优化。有趣的是,对于基于功能的索引,CBO通常不会考虑MIN / MAX优化,这是另一个小错误。