尽管有STOPKEY优化,但是top-N查询做了太多工作

时间:2013-05-21 21:30:13

标签: performance oracle query-optimization top-n

这将会很长,所以这里有一个快速摘要来吸引你:我的 其计划中COUNT STOPKEYORDER BY STOPKEY的前N个查询是 没有充分理由仍然很慢。

现在,细节。它以慢速功能开始。在现实生活中 涉及使用regexp进行字符串操作。出于演示目的, 这是一个故意愚蠢的递归Fibonacci算法。我找到了 对于大约25的输入来说非常快,大约30的速度慢,并且 35岁时很荒谬。

-- I repeat: Please no advice on how to do Fibonacci correctly.
-- This is slow on purpose!
CREATE OR REPLACE FUNCTION tmp_fib (
  n INTEGER
)
  RETURN INTEGER
AS
BEGIN
  IF n = 0 OR n = 1 THEN
    RETURN 1;
  END IF;
  RETURN tmp_fib(n-2) + tmp_fib(n-1);
END;
/

现在有些输入:名单和数字列表。

CREATE TABLE tmp_table (
  name VARCHAR2(20) UNIQUE NOT NULL,
  num NUMBER(2,0)
);
INSERT INTO tmp_table (name,num)
  SELECT 'Alpha',    10 FROM dual UNION ALL
  SELECT 'Bravo',    11 FROM dual UNION ALL
  SELECT 'Charlie',  33 FROM dual;

以下是慢查询的示例:使用慢速Fibonacci函数 选择其num生成带有加倍数字的Fibonacci数的行。

SELECT p.name, p.num
FROM tmp_table p
WHERE REGEXP_LIKE(tmp_fib(p.num), '(.)\1')
ORDER BY p.name;

对于11和33都是如此,因此BravoCharlie在输出中。 运行大约需要5秒钟,几乎所有这些都很慢 计算tmp_fib(33)。所以我想做一个更快的版本 通过将其转换为前N个查询来缓慢查询。 N = 1,看起来像 这样:

SELECT * FROM (
  SELECT p.name, p.num
  FROM tmp_table p
  WHERE REGEXP_LIKE(tmp_fib(p.num), '(.)\1')
  ORDER BY p.name
)
WHERE ROWNUM <= 1;

现在它返回最高结果Bravo。但它仍需要5秒钟 跑步!唯一的解释是它仍在计算中 tmp_fib(33),即使计算结果无关紧要 结果。应该能够确定Bravo正在进行 要输出,所以不需要测试WHERE条件 其余的表。

我认为可能只需要告诉优化器 tmp_fib很贵。所以我试着告诉它,像这样:

ASSOCIATE STATISTICS WITH FUNCTIONS tmp_fib DEFAULT COST (999999999,0,0);

这改变了计划中的一些成本数字,但它没有 查询运行得更快。

SELECT * FROM v$version的输出,如果这是版本相关的:

Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for 64-bit Windows: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

这是top-1查询的自动跟踪。它似乎在宣称 查询耗时1秒,但事实并非如此。它跑了大约5 秒。

NAME                        NUM
-------------------- ----------
Bravo                        11


Execution Plan
----------------------------------------------------------
Plan hash value: 548796432

-------------------------------------------------------------------------------------
| Id  | Operation               | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |           |     1 |    55 |     4  (25)| 00:00:01 |
|*  1 |  COUNT STOPKEY          |           |       |       |            |          |
|   2 |   VIEW                  |           |     1 |    55 |     4  (25)| 00:00:01 |
|*  3 |    SORT ORDER BY STOPKEY|           |     1 |    55 |     4  (25)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL   | TMP_TABLE |     1 |    55 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=1)
   3 - filter(ROWNUM<=1)
   4 - filter( REGEXP_LIKE (TO_CHAR("TMP_FIB"("P"."NUM")),'(.)\1'))

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
         27  recursive calls
          0  db block gets
         25  consistent gets
          0  physical reads
          0  redo size
        593  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed

更新:正如我在评论中提到的,INDEX提示有助于此查询。尽管它不能很好地转化为我的现实场景,但接受它作为正确的答案已经足够好了。而且具有讽刺意味的是,Oracle似乎从经验中吸取了教训,现在默认选择INDEX计划;我必须告诉它NO_INDEX重现原始的缓慢行为。

在现实场景中,我应用了一个更复杂的解决方案,将查询重写为PL / SQL函数。以下是我的技术外观,应用于fib问题:

CREATE OR REPLACE PACKAGE tmp_package IS
  TYPE t_namenum IS TABLE OF tmp_table%ROWTYPE;
  FUNCTION get_interesting_names (howmany INTEGER) RETURN t_namenum PIPELINED;
END;
/

CREATE OR REPLACE PACKAGE BODY tmp_package IS
  FUNCTION get_interesting_names (howmany INTEGER) RETURN t_namenum PIPELINED IS
    CURSOR c IS SELECT name, num FROM tmp_table ORDER BY name;
    rec c%ROWTYPE;
    outcount INTEGER;
  BEGIN
    OPEN c;
    outcount := 0;
    WHILE outcount < howmany LOOP
      FETCH c INTO rec;
      EXIT WHEN c%NOTFOUND;
      IF REGEXP_LIKE(tmp_fib(rec.num), '(.)\1') THEN
        PIPE ROW(rec);
        outcount := outcount + 1;
      END IF;
    END LOOP;
  END;
END;
/

SELECT * FROM TABLE(tmp_package.get_interesting_names(1));

感谢响应者阅读问题并运行测试并帮助我理解执行计划,并且我会处理这个问题但是他们建议。

2 个答案:

答案 0 :(得分:2)

后续评论,因为这太大了。在11.2.0.3(OEL)下运行,您的查询:

SELECT * FROM (
  SELECT p.name, p.num
  FROM tmp_table p
  WHERE REGEXP_LIKE(tmp_fib(p.num), '(.)\1')
  ORDER BY p.name
)
WHERE ROWNUM <= 1;

NAME                        NUM
-------------------- ----------
Bravo                        11 

Elapsed: 00:00:00.094
Plan hash value: 1058933870

----------------------------------------------------------------------------------
| Id  | Operation            | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |           |     1 |    25 |     4  (25)| 00:00:01 |
|*  1 |  COUNT STOPKEY       |           |       |       |            |          |
|*  2 |   VIEW               |           |     3 |    75 |     4  (25)| 00:00:01 |
|   3 |    SORT ORDER BY     |           |     3 |    75 |     4  (25)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| TMP_TABLE |     3 |    75 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=1)
   2 - filter( REGEXP_LIKE (TO_CHAR("TMP_FIB"("NUM")),'(.)\1'))

Note
-----
   - dynamic sampling used for this statement (level=2)

请注意SORT ORDER BY与您所看到的内容的变化以及相应的rows值。将订单移动到子选项看起来更像您的:

SELECT * FROM (
  SELECT * FROM (
    SELECT p.name, p.num
    FROM tmp_table p
    ORDER BY p.name
  )
  WHERE REGEXP_LIKE(tmp_fib(num), '(.)\1')
)
WHERE ROWNUM <= 1;

NAME                        NUM
-------------------- ----------
Bravo                        11 

Elapsed: 00:00:07.894
Plan hash value: 548796432

-------------------------------------------------------------------------------------
| Id  | Operation               | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |           |     1 |    25 |   171  (99)| 00:00:03 |
|*  1 |  COUNT STOPKEY          |           |       |       |            |          |
|   2 |   VIEW                  |           |     1 |    25 |   171  (99)| 00:00:03 |
|*  3 |    SORT ORDER BY STOPKEY|           |     1 |    25 |   171  (99)| 00:00:03 |
|*  4 |     TABLE ACCESS FULL   | TMP_TABLE |     1 |    25 |   170  (99)| 00:00:03 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=1)
   3 - filter(ROWNUM<=1)
   4 - filter( REGEXP_LIKE (TO_CHAR("TMP_FIB"("P"."NUM")),'(.)\1'))

Note
-----
   - dynamic sampling used for this statement (level=2)

不知道这在您的真实场景中会有多大帮助或实用,但在这种情况下(在我的环境中,无论如何),在所有提取的列中添加索引 - 获取完整的索引扫描而不是完整的索引扫描 - 似乎改变了行为:

CREATE INDEX tmp_index ON tmp_table(name, num);

index TMP_INDEX created.

SELECT * FROM (
  SELECT p.name, p.num
  FROM tmp_table p
  WHERE REGEXP_LIKE(tmp_fib(p.num), '(.)\1')
  ORDER BY p.name
)
WHERE ROWNUM <= 1;

NAME                        NUM
-------------------- ----------
Bravo                        11 

Elapsed: 00:00:00.093
Plan hash value: 1841475998

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |     1 |    25 |     1   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY    |           |       |       |            |          |
|*  2 |   VIEW            |           |     3 |    75 |     1   (0)| 00:00:01 |
|   3 |    INDEX FULL SCAN| TMP_INDEX |     3 |    75 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=1)
   2 - filter( REGEXP_LIKE (TO_CHAR("TMP_FIB"("NUM")),'(.)\1'))

Note
-----
   - dynamic sampling used for this statement (level=2)

SELECT * FROM (
  SELECT * FROM (
    SELECT p.name, p.num
    FROM tmp_table p
    ORDER BY p.name
  )
  WHERE REGEXP_LIKE(tmp_fib(num), '(.)\1')
)
WHERE ROWNUM <= 1;

NAME                        NUM
-------------------- ----------
Bravo                        11 

Elapsed: 00:00:00.093
Plan hash value: 1841475998

-------------------------------------------------------------------------------
| Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |           |     1 |    25 |     1   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY    |           |       |       |            |          |
|   2 |   VIEW            |           |     1 |    25 |     1   (0)| 00:00:01 |
|*  3 |    INDEX FULL SCAN| TMP_INDEX |     1 |    25 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=1)
   3 - filter( REGEXP_LIKE (TO_CHAR("TMP_FIB"("P"."NUM")),'(.)\1'))

Note
-----
   - dynamic sampling used for this statement (level=2)

顺便说一句,aftr我运行了几次rownum变种后,我最终开始出现ORA-01000: maximum open cursors exceeded错误。我在每次运行结束时都丢弃了这些对象,但保持连接状态。我认为这表明某个地方存在另一个错误,但可能与你所看到的无关,因为即使使用索引扫描它也会发生。

答案 1 :(得分:1)

兴趣显然已经消失,所以我只是在自我回答中总结可能的解决方案。

  1. 升级 - 较新的Oracle似乎更好地优化了此类查询。
  2. 使用INDEX提示使内部查询以已排序的顺序检索行,这使STOPKEY能够正常工作。
  3. 在PL / SQL中重写,内部查询作为游标。从光标获取,直到你得到足够的匹配,然后关闭它。