我有一张150万行的表。我运行一个查询,该查询获取列中具有非重复值的记录。我正在观察一种行为,在创建索引之后,查询的性能会降低。我还使用了100%估计百分比的dbms_stats(计算模式) 收集统计信息,以便oracle 11g CBO为查询计划做出更明智的决定,但它不会改善查询执行时间。
SQL> desc tab3;
Name Null? Type
----------------------------------------------
COL1 NUMBER(38)
COL2 VARCHAR2(100)
COL3 VARCHAR2(36)
COL4 VARCHAR2(36)
COL5 VARCHAR2(4000)
COL6 VARCHAR2(4000)
MEASURE_0 VARCHAR2(4000)
MEASURE_1 VARCHAR2(4000)
MEASURE_2 VARCHAR2(4000)
MEASURE_3 VARCHAR2(4000)
MEASURE_4 VARCHAR2(4000)
MEASURE_5 VARCHAR2(4000)
MEASURE_6 VARCHAR2(4000)
MEASURE_7 VARCHAR2(4000)
MEASURE_8 VARCHAR2(4000)
MEASURE_9 VARCHAR2(4000)
列measure_0
有40万个唯一值。
SQL> select count(*) from (select measure_0 from tab3 group by measure_0 having count(*) = 1) abc;
COUNT(*)
----------
403664
以下是执行计划的查询,请注意表中没有索引。
SQL> set autotrace traceonly;
SQL> SELECT * FROM (
2 SELECT
3 (ROWNUM -1) AS COL1,
4 ft.COL1 AS OLD_COL1,
5 ft.COL2,
6 ft.COL3,
7 ft.COL4,
8 ft.COL5,
9 ft.COL6,
10 ft.MEASURE_0,
11 ft.MEASURE_1,
12 ft.MEASURE_2,
13 ft.MEASURE_3,
14 ft.MEASURE_4,
15 ft.MEASURE_5,
16 ft.MEASURE_6,
17 ft.MEASURE_7,
18 ft.MEASURE_8,
19 ft.MEASURE_9
20 FROM tab3 ft
21 WHERE MEASURE_0 IN
22 (
23 SELECT MEASURE_0
24 FROM tab3
25 GROUP BY MEASURE_0
26 HAVING COUNT(*) = 1
27 )
28 ) ABC WHERE COL1 >= 0 AND COL1 <=449;
450 rows selected.
Elapsed: 00:00:01.90
Execution Plan
----------------------------------------------------------
Plan hash value: 3115757351
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1243 | 28M| 717K (1)| 02:23:29 |
|* 1 | VIEW | | 1243 | 28M| 717K (1)| 02:23:29 |
| 2 | COUNT | | | | | |
|* 3 | HASH JOIN | | 1243 | 30M| 717K (1)| 02:23:29 |
| 4 | VIEW | VW_NSO_1 | 1686K| 3219M| 6274 (2)| 00:01:16 |
|* 5 | FILTER | | | | | |
| 6 | HASH GROUP BY | | 1 | 3219M| 6274 (2)| 00:01:16 |
| 7 | TABLE ACCESS FULL| TAB3 | 1686K| 3219M| 6196 (1)| 00:01:15 |
| 8 | TABLE ACCESS FULL | TAB3 | 1686K| 37G| 6211 (1)| 00:01:15 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("COL1">=0 AND "COL1"<=449)
3 - access("MEASURE_0"="MEASURE_0")
5 - filter(COUNT(*)=1)
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
354 recursive calls
0 db block gets
46518 consistent gets
45122 physical reads
0 redo size
43972 bytes sent via SQL*Net to client
715 bytes received via SQL*Net from client
31 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
450 rows processed
查询占用 1.90 秒。如果我再次运行查询,则需要 1.66 秒。为什么第一次运行需要更多时间?
为了加快速度,我在查询中使用的两列上创建了索引。
SQL> create index ind_tab3_orgid on tab3(COL1);
Index created.
Elapsed: 00:00:01.68
SQL> create index ind_tab3_msr_0 on tab3(measure_0);
Index created.
Elapsed: 00:00:01.83
当我第一次在此之后解雇查询时,需要一个大概 21 秒才能回来。而随后的运行将其买入 2.9 秒。为什么oracle在第一次运行中花费了这么多时间,是热身还是什么......让我感到困惑!
这是一个需要2.9秒的计划 -
450 rows selected.
Elapsed: 00:00:02.92
Execution Plan
----------------------------------------------------------
Plan hash value: 240271480
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1243 | 28M| 711K (1)| 02:22:15 |
|* 1 | VIEW | | 1243 | 28M| 711K (1)| 02:22:15 |
| 2 | COUNT | | | | | |
| 3 | NESTED LOOPS | | | | | |
| 4 | NESTED LOOPS | | 1243 | 30M| 711K (1)| 02:22:15 |
| 5 | VIEW | VW_NSO_1 | 1686K| 3219M| 6274 (2)| 00:01:16 |
|* 6 | FILTER | | | | | |
| 7 | HASH GROUP BY | | 1 | 3219M| 6274 (2)| 00:01:16 |
| 8 | TABLE ACCESS FULL | TAB3 | 1686K| 3219M| 6196 (1)| 00:01:15 |
|* 9 | INDEX RANGE SCAN | IND_TAB3_MSR_0 | 1243 | | 2 (0)| 00:00:01 |
| 10 | TABLE ACCESS BY INDEX ROWID| TAB3 | 1243 | 28M| 44 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("COL1">=0 AND "COL1"<=449)
6 - filter(COUNT(*)=1)
9 - access("MEASURE_0"="MEASURE_0")
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
660054 consistent gets
22561 physical reads
0 redo size
44358 bytes sent via SQL*Net to client
715 bytes received via SQL*Net from client
31 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
450 rows processed
我期待时间低于表格未编入索引时的时间。为什么表的索引版本比非索引版本需要更多时间来获取结果?如果我没有错,那就是占用时间的TABLE ACCESS BY INDEX ROWID。我可以强制oracle使用TABLE ACCESS FULL吗?
然后我收集了表格的统计数据,以便CBO通过计算选项改进计划。所以现在统计数据是准确的。
SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'EQUBE67DP', tabname=>'TAB3',estimate_percent=>null,cascade=>true);
PL/SQL procedure successfully completed.
Elapsed: 00:01:02.47
SQL> set autotrace off;
SQL> select COLUMN_NAME,NUM_DISTINCT,SAMPLE_SIZE,HISTOGRAM,LAST_ANALYZED from dba_tab_cols where table_name = 'TAB3' ;
COLUMN_NAME NUM_DISTINCT SAMPLE_SIZE HISTOGRAM LAST_ANALYZED
------------------------------ ------------ ----------- --------------- ---------
COL1 1502257 1502257 NONE 27-JUN-12
COL2 0 NONE 27-JUN-12
COL3 1 1502257 NONE 27-JUN-12
COL4 0 NONE 27-JUN-12
COL5 1502257 1502257 NONE 27-JUN-12
COL6 1502257 1502257 NONE 27-JUN-12
MEASURE_0 405609 1502257 HEIGHT BALANCED 27-JUN-12
MEASURE_1 128570 1502257 NONE 27-JUN-12
MEASURE_2 1502257 1502257 NONE 27-JUN-12
MEASURE_3 185657 1502257 NONE 27-JUN-12
MEASURE_4 901 1502257 NONE 27-JUN-12
MEASURE_5 17 1502257 NONE 27-JUN-12
MEASURE_6 2202 1502257 NONE 27-JUN-12
MEASURE_7 2193 1502257 NONE 27-JUN-12
MEASURE_8 21 1502257 NONE 27-JUN-12
MEASURE_9 27263 1502257 NONE 27-JUN-12
我再次运行查询
450 rows selected.
Elapsed: 00:00:02.95
Execution Plan
----------------------------------------------------------
Plan hash value: 240271480
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 31M| 718G| 8046 (2)| 00:01:37 |
|* 1 | VIEW | | 31M| 718G| 8046 (2)| 00:01:37 |
| 2 | COUNT | | | | | |
| 3 | NESTED LOOPS | | | | | |
| 4 | NESTED LOOPS | | 31M| 62G| 8046 (2)| 00:01:37 |
| 5 | VIEW | VW_NSO_1 | 4057 | 7931K| 6263 (2)| 00:01:16 |
|* 6 | FILTER | | | | | |
| 7 | HASH GROUP BY | | 1 | 20285 | 6263 (2)| 00:01:16 |
| 8 | TABLE ACCESS FULL | TAB3 | 1502K| 7335K| 6193 (1)| 00:01:15 |
|* 9 | INDEX RANGE SCAN | IND_TAB3_MSR_0 | 4 | | 2 (0)| 00:00:01 |
| 10 | TABLE ACCESS BY INDEX ROWID| TAB3 | 779K| 75M| 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("COL1">=0 AND "COL1"<=449)
6 - filter(COUNT(*)=1)
9 - access("MEASURE_0"="MEASURE_0")
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
660054 consistent gets
22561 physical reads
0 redo size
44358 bytes sent via SQL*Net to client
715 bytes received via SQL*Net from client
31 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
450 rows processed
这次查询以 2.9 秒的速度返回(有时也需要 3.9 秒)。
我的目标是尽可能减少查询执行时间。但在添加索引或计算统计数据后,查询时间不断增加。为什么会发生这种情况,如何通过保留索引来改进?
答案 0 :(得分:12)
首先,让我引用Tom Kyte:
只是一遍又一遍地对自己说
“全扫描不邪恶,指标不好” “全扫描不邪恶,指标不好” “全扫描不邪恶,指标不好” “全扫描不邪恶,指标不好” “全扫描不邪恶,指标不好” “全扫描不是邪恶的,索引不好”
索引并不总是提高性能,它们不是神奇的银弹(就好像这样的东西曾经存在过一样)。
现在您要求为什么索引需要更长时间。答案很简单:
换句话说:Oracle使用索引执行的读操作多于全表扫描。这是因为:
至于为什么优化器选择使用这个明显无效的索引,这是因为即使使用esimate_percent=100
和完整的直方图(您已在MEASURE_0
列上收集),仍然会有一些数据分布通过简单的优化器分析无法可靠地表达。特别是,分析器不能很好地理解交叉列和交叉表依赖性。这导致错误的估计,导致计划选择不佳。
编辑:似乎CBO的工作假设根本不适用于此自我加入(您的上一次查询预计会有3100万行,而只会选择450行!)。这非常令人费解,因为该表只有1.5 M行。您使用的是哪个版本的Oracle?
我认为您会发现可以删除自联接,从而通过分析提高查询性能:
SELECT * FROM (
SELECT (ROWNUM -1) AS COL1, ABC.*
FROM (
SELECT
ft.COL1 AS OLD_COL1,
[...],
COUNT(*) OVER (PARTITION BY MEASURE_O) nb_0
FROM tab3 ft
) ABC
WHERE nb_0 = 1
AND ROWNUM - 1 <= 449
) v
WHERE COL1 >= 0;
您还问为什么第一次运行查询会花费更多时间。这是因为有缓存在起作用。在数据库级别上有SGA,其中所有块首先从磁盘复制,然后可以多次读取(第一次查询块始终是物理读取)。然后,一些系统还具有独立的系统缓存,如果最近读取了数据,它将更快地返回数据。
进一步阅读:
答案 1 :(得分:3)
此代码如何执行?
SELECT ROWNUM - 1 AS col1
, ft.col1 AS old_col1
, ft.col2
, ft.col3
, ft.col4
, ft.col5
, ft.col6
, ft.measure_0
, ft.measure_1
, ft.measure_2
, ft.measure_3
, ft.measure_4
, ft.measure_5
, ft.measure_6
, ft.measure_7
, ft.measure_8
, ft.measure_9
FROM tab3 ft
WHERE NOT EXISTS (SELECT NULL
FROM tab3 ft_prime
WHERE ft_prime.measure_0 = ft.measure_0
AND ft_prime.ROWID <> ft.ROWID)
AND ROWNUM <= 450;