卷积嵌套CASE WHEN - 最好在SELECT语句中使用子查询还是只加入其他表?

时间:2017-05-02 14:20:28

标签: oracle performance

这是描述我当前脚本的伪代码:

SELECT
  A.ID
, A.YEAR
, CASE WHEN (SELECT B.ID
               FROM TABLE_B
              WHERE B.ID = A.ID
                AND B.YEAR = A.YEAR
                AND B.CONDITION = 'TRUE') = A.ID THEN    
    CASE WHEN (SELECT C.STATUS
                 FROM TABLE_C
                WHERE C.ID = A.ID
                  AND C.YEAR = A.YEAR) = 'X' THEN 'STATUS X'
  ELSE 'STATUS Y' END
  ELSE CASE WHEN (SELECT C.STATUS
                    FROM TABLE_C
                   WHERE C.ID = A.ID
                     AND C.YEAR = A.YEAR) = 'Z' THEN 'STATUS Z'
       ELSE 'STATUS NOT FOUND' END
  END AS STATUS
FROM TABLE_A A

我简化了这个伪代码。我的实际脚本有更多的子查询,所有子命中都是相同的两个表 - 这看起来过于复杂,我想知道将TABLE_B和TABLE_C连接到我的外部查询是否会更好?或者可能将这些字段选择到临时表中,然后编写一个将更新STATUS字段的游标?

使用此脚本的Web应用程序将被许多人使用,因此性能绝对是一个问题。

1 个答案:

答案 0 :(得分:1)

在没有看到真实数据的情况下,很难说出任何确定的信息。 xQbert有一些有趣的评论;我还建议根据您的实际数据对您真实的潜在查询进行基准测试和测试,并比较您的结果 优化器非常智能,但根据TABLE_ATABLE_BTABLE_C中数据的性质和数量,您可能会得到不同的计划。
答案可能会有所不同,具体取决于数据和查询。最清楚的方法是测试。

以下是一个例子:

首先,创建测试表并加载它们。对于这个例子,我将随意地在每个行中抛出20K行,三者之间的数据均匀性很高。

CREATE TABLE TABLE_A(ID NUMBER GENERATED ALWAYS AS IDENTITY  NOT NULL PRIMARY KEY, YEAR NUMBER);
CREATE TABLE TABLE_B(ID NUMBER GENERATED ALWAYS AS IDENTITY NOT NULL PRIMARY KEY, YEAR NUMBER, CONDITION VARCHAR2(20));
CREATE TABLE TABLE_C(ID NUMBER GENERATED ALWAYS AS IDENTITY NOT NULL PRIMARY KEY, YEAR NUMBER, STATUS VARCHAR2(20));


INSERT INTO TABLE_A(YEAR) SELECT LEVEL FROM DUAL CONNECT BY LEVEL <= 20000;
INSERT INTO TABLE_B(YEAR,CONDITION)
SELECT LEVEL, DECODE(MOD(LEVEL,2),0,'TRUE','FALSE') FROM DUAL CONNECT BY LEVEL <= 20000;
INSERT INTO TABLE_C(YEAR,STATUS)
  SELECT LEVEL, DECODE(MOD(LEVEL,3),0,'X',1,'Y','Z') FROM DUAL CONNECT BY LEVEL <= 20000;

... gather stats

然后,比较您的查询。首先使用标量:

SELECT
  A.ID
  , A.YEAR
  , CASE WHEN (SELECT B.ID
               FROM TABLE_B B
               WHERE B.ID = A.ID
                     AND B.YEAR = A.YEAR
                     AND B.CONDITION = 'TRUE') = A.ID THEN
  CASE WHEN (SELECT C.STATUS
             FROM TABLE_C C
             WHERE C.ID = A.ID
                   AND C.YEAR = A.YEAR) = 'X' THEN 'STATUS X'
  ELSE 'STATUS Y' END
    ELSE CASE WHEN (SELECT C.STATUS
                    FROM TABLE_C C
                    WHERE C.ID = A.ID
                          AND C.YEAR = A.YEAR) = 'Z' THEN 'STATUS Z'
         ELSE 'STATUS NOT FOUND' END
    END AS STATUS
FROM TABLE_A A
ORDER BY 1 ASC;

-----------------------------------------------------------------------------------------------  
| Id  | Operation                     | Name          | Rows  | Bytes | Cost (%CPU)| Time     |  
-----------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT              |               | 20000 |   156K| 89479   (1)| 00:00:04 |  
|*  1 |  TABLE ACCESS BY INDEX ROWID  | TABLE_B       |     1 |    14 |     2   (0)| 00:00:01 |  
|*  2 |   INDEX UNIQUE SCAN           | SYS_C00409109 |     1 |       |     1   (0)| 00:00:01 |  
|*  3 |   TABLE ACCESS BY INDEX ROWID | TABLE_C       |     1 |    10 |     2   (0)| 00:00:01 |  
|*  4 |    INDEX UNIQUE SCAN          | SYS_C00409111 |     1 |       |     1   (0)| 00:00:01 |  
|*  5 |    TABLE ACCESS BY INDEX ROWID| TABLE_C       |     1 |    10 |     2   (0)| 00:00:01 |  
|*  6 |     INDEX UNIQUE SCAN         | SYS_C00409111 |     1 |       |     1   (0)| 00:00:01 |  
|   7 |  SORT ORDER BY                |               | 20000 |   156K| 89479   (1)| 00:00:04 |  
|   8 |   TABLE ACCESS FULL           | TABLE_A       | 20000 |   156K|    11   (0)| 00:00:01 |  
-----------------------------------------------------------------------------------------------  


Statistics
-----------------------------------------------------------
               4  CPU used by this session
               4  CPU used when call started
              11  DB time
           10592  Requests to/from client
           10591  SQL*Net roundtrips to/from client
           80006  buffer is not pinned count
             766  buffer is pinned count
          117828  bytes received via SQL*Net from client
         2531165  bytes sent via SQL*Net to client
               2  calls to get snapshot scn: kcmgss
               5  calls to kcmgcs
           41127  consistent gets
...
...
etc.

与返回等效数据的连接查询进行比较:

SELECT
  TABLE_A.ID,
  TABLE_A.YEAR,
  CASE WHEN TABLE_B.CONDITION = 'TRUE'
    THEN
      DECODE(TABLE_C.STATUS, 'X', 'STATUS X', 'STATUS Y')
  ELSE DECODE(TABLE_C.STATUS, 'Z', 'STATUS Z', 'STATUS NOT FOUND') END AS STATUS
FROM TABLE_A
  LEFT OUTER JOIN TABLE_B
    ON TABLE_A.ID = TABLE_B.ID
       AND TABLE_A.YEAR = TABLE_B.YEAR
  LEFT OUTER JOIN TABLE_C
    ON TABLE_A.ID = TABLE_C.ID
       AND TABLE_A.YEAR = TABLE_C.YEAR
ORDER BY 1 ASC;

----------------------------------------------------------------------------------  
| Id  | Operation              | Name    | Rows  | Bytes | Cost (%CPU)| Time     |  
----------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT       |         | 20000 |   625K|    42  (10)| 00:00:01 |  
|   1 |  SORT ORDER BY         |         | 20000 |   625K|    42  (10)| 00:00:01 |  
|*  2 |   HASH JOIN RIGHT OUTER|         | 20000 |   625K|    40   (5)| 00:00:01 |  
|   3 |    TABLE ACCESS FULL   | TABLE_C | 20000 |   195K|    13   (0)| 00:00:01 |  
|*  4 |    HASH JOIN OUTER     |         | 20000 |   429K|    26   (4)| 00:00:01 |  
|   5 |     TABLE ACCESS FULL  | TABLE_A | 20000 |   156K|    11   (0)| 00:00:01 |  
|   6 |     TABLE ACCESS FULL  | TABLE_B | 20000 |   273K|    14   (0)| 00:00:01 |  
---------------------------------------------------------------------------------- 

Statistics

-----------------------------------------------------------
           1  CPU used by this session
           1  CPU used when call started
          13  DB time
       10592  Requests to/from client
       10591  SQL*Net roundtrips to/from client
      117622  bytes received via SQL*Net from client
     2531166  bytes sent via SQL*Net to client
           2  calls to get snapshot scn: kcmgss
          11  calls to kcmgcs
         160  consistent gets
...
...
etc.

在此示例中,所有表都具有相似的行和1:1的连接关系 正如预期的那样,查询有不同的计划 但是如果TABLE_A有1B行并且很少有人加入TABLE_C怎么办? 或者,如果TABLE_B 100%为'TRUE',或者数据是高度倾斜的,等等? 标量计划和连接计划都将根据不同的数据集而有所不同 在混合中投掷临时表肯定会有不同的表现(如果它在这种情况下赢了一天,我会感到惊讶,但是......)
测试是为了找到最适合您数据的方法,这是最可靠的方法,特别是考虑到您对性能的关注。