尽管ORA-904在子查询中,UPDATE工作正常(但确实非常慢)

时间:2016-11-09 22:09:20

标签: sql oracle oracle12c

我在WHERE中有一个带子查询的UPDATE语句来查找重复项。子查询在运行子查询本身时会显示错误,但在UPDATE语句中运行时,不会显示错误,并且DML运行正常(但速度很慢)。

参见表格设置:

CREATE TABLE RAW_table
(
  ERROR_LEVEL      NUMBER(3),
  RAW_DATA_ROW_ID  INTEGER,
  ATTRIBUTE_1      VARCHAR2(4000 BYTE)
)
;

INSERT INTO RAW_table VALUES (0,    2,  '509NTQD9Q868');
INSERT INTO RAW_table VALUES (0,    2,  '509NTQD9Q868');
INSERT INTO RAW_table VALUES (0,    2,  '509NTQD9Q868');
INSERT INTO RAW_table VALUES (0,    3,  '509NTVS9Q863');
INSERT INTO RAW_table VALUES (0,    3,  '509NTVS9Q863');
INSERT INTO RAW_table VALUES (0,    3,  '509NTVS9Q863');

COMMIT;

有错误的查询是:

SELECT UPPER(ATTRIBUTE_1), rid
  FROM ( SELECT UPPER(ATTRIBUTE_1)
              , ROWID AS rid
              , ROW_NUMBER() OVER ( PARTITION BY UPPER (ATTRIBUTE_1) ORDER BY RAW_DATA_ROW_ID) AS RN
           FROM RAW_table
       )
 WHERE RN > 1;

运行时会显示ORA-00904: "ATTRIBUTE_1": invalid identifier

但是,在WHERE语句中使用上述查询(从第4行开始)的以下DML工作正常:

set timing on

UPDATE RAW_table
   SET ERROR_LEVEL   = 4
 WHERE (UPPER (ATTRIBUTE_1), ROWID) 
       IN (SELECT UPPER (ATTRIBUTE_1), rid
           FROM (SELECT UPPER (ATTRIBUTE_1), ROWID AS rid
                     , ROW_NUMBER() OVER ( PARTITION BY UPPER (ATTRIBUTE_1) ORDER BY RAW_DATA_ROW_ID) AS RN
                  FROM RAW_table
                )
           WHERE RN > 1
          )
;

4 rows updated.
Elapsed: 00:00:00.36

为什么呢?为什么?为什么呢?

我预计UPDATE也会因ORA-00904: "ATTRIBUTE_1": invalid identifier而失败。为什么它不会失败?

然而,真正的问题并不是说UPDATE实际上有效,而是它的工作效果非常慢。

当我更正子查询时,不要触发ORA-00904: "ATTRIBUTE_1": invalid identifier这样的事情:

UPDATE RAW_table
   SET ERROR_LEVEL   = 4
 WHERE (UPPER (ATTRIBUTE_1), ROWID) 
        IN (SELECT checked_column, rid
           FROM (SELECT UPPER (ATTRIBUTE_1) AS checked_column, ROWID AS rid
                     , ROW_NUMBER() OVER ( PARTITION BY UPPER (ATTRIBUTE_1) ORDER BY RAW_DATA_ROW_ID) AS RN
                  FROM RAW_table
                )
           WHERE RN > 1
          )
;

查询在11.000行的测试数据集上加速了近400次

SELECT COUNT(*) FROM RAW_table;

  COUNT(*)
----------
     11004
1 row selected.

更正了查询:

1005 rows updated.
Elapsed: 00:00:00.28

使用ORA-904查询:

1005 rows updated.
Elapsed: 00:01:48.40

我没耐心等到71.000行结束测试:

SELECT COUNT(*) FROM RAW_table;
  COUNT(*)
----------
     71475
1 row selected.

Corrected query
11004 rows updated.
Elapsed: 00:00:00.60

Query with ORA-904

30分钟后取消......

使用ORA-904解释查询计划:

UPDATE STATEMENT  ALL_ROWS     Cost: **2 544 985 615**  Bytes: 8 464 752  Cardinality: 4 176  
     7 UPDATE RAW_TABLE 
          6 FILTER  
               1 TABLE ACCESS FULL TABLE RAW_TABLE Cost: 54  Bytes: 169 282 878  Cardinality: 83 514  
               5 VIEW  Cost: 30 486  Bytes: 2 087 850  Cardinality: 83 514  
                    4 WINDOW SORT  Cost: 30 486  Bytes: 169 282 878  Cardinality: 83 514  
                         3 FILTER  
                              2 TABLE ACCESS FULL TABLE RAW_TABLE Cost: 54  Bytes: 169 282 878  Cardinality: 83 514  

解释纠正查询的计划:

UPDATE STATEMENT  ALL_ROWS     Cost: **36 637**  Bytes: 3 374 235  Cardinality: 835  
     7 UPDATE RAW_TABLE 
          6 HASH JOIN RIGHT SEMI  Cost: 36 637  Bytes: 3 374 235  Cardinality: 835  
               4 VIEW VIEW SYS.VW_NSO_1 Cost: 30 486  Bytes: 168 197 196  Cardinality: 83 514  
                    3 VIEW  Cost: 30 486  Bytes: 169 282 878  Cardinality: 83 514  
                         2 WINDOW SORT  Cost: 30 486  Bytes: 169 282 878  Cardinality: 83 514  
                              1 TABLE ACCESS FULL TABLE RAW_TABLE Cost: 54  Bytes: 169 282 878  Cardinality: 83 514  
               5 TABLE ACCESS FULL TABLE RAW_TABLE Cost: 54  Bytes: 169 282 878  Cardinality: 83 514  

分析表后,成本计划是一样的。 使用ORA-904解释查询计划:

UPDATE STATEMENT  ALL_ROWS     Cost: **29 381 690**  Bytes: 38  Cardinality: 2
     7 UPDATE RAW_TABLE
          6 FILTER
               1 TABLE ACCESS FULL TABLE RAW_TABLE Cost: 54  Bytes: 1 358 025  Cardinality: 71 475
               5 VIEW  Cost: 427  Bytes: 1 786 875  Cardinality: 71 475
                    4 WINDOW SORT  Cost: 427  Bytes: 1 358 025  Cardinality: 71 475
                         3 FILTER
                              2 TABLE ACCESS FULL TABLE RAW_TABLE Cost: 54  Bytes: 1 358 025  Cardinality: 71 475

解释纠正查询的计划:

UPDATE STATEMENT  ALL_ROWS     Cost: **3 123**  Bytes: 1 453 595  Cardinality: 715
     7 UPDATE RAW_TABLE
          6 HASH JOIN SEMI  Cost: 3 123  Bytes: 1 453 595  Cardinality: 715
               5 VIEW VIEW SYS.VW_NSO_1 Cost: 427  Bytes: 143 950 650  Cardinality: 71 475
                    4 VIEW  Cost: 427  Bytes: 144 879 825  Cardinality: 71 475
                         3 WINDOW SORT  Cost: 427  Bytes: 1 358 025  Cardinality: 71 475
                              2 TABLE ACCESS FULL TABLE RAW_TABLE Cost: 54  Bytes: 1 358 025  Cardinality: 71 475
               1 TABLE ACCESS FULL TABLE RAW_TABLE Cost: 54  Bytes: 1 358 025  Cardinality: 71 475

解释计划成本说明了一切,但为什么会有这么多不同?

我刚刚在计算了桌子上的统计数据后再次触发了71.000行测试,但它已经运行了几分钟......

这一切都在Oracle Database 12c企业版12.1.0.2.0版 - 64位上。

4 个答案:

答案 0 :(得分:3)

您的SELECT失败,因为子查询中没有名为ATTRIBUTE_1的列。您需要指定名称:

SELECT UPPER(ATTRIBUTE_1), rid
  FROM ( SELECT UPPER(ATTRIBUTE_1) as ATTRIBUTE_1, 
                ROWID AS rid,
                ROW_NUMBER() OVER (PARTITION BY UPPER(ATTRIBUTE_1) ORDER BY RAW_DATA_ROW_ID) AS RN
         FROM RAW_table
       )
 WHERE RN > 1;

UPDATE不会生成错误,因为它从外部查询中提取值:

UPDATE RAW_table
-------^
|   SET ERROR_LEVEL   = 4
| WHERE (UPPER (ATTRIBUTE_1), ROWID) IN 
|         (SELECT checked_column, rid
|          FROM (SELECT UPPER(ATTRIBUTE_1) AS checked_column, ROWID AS rid,
------------------------------^  This is interpreted as RAW_table.ATTRIBUTE_1
                        ROW_NUMBER() OVER (PARTITION BY UPPER(ATTRIBUTE_1) ORDER BY RAW_DATA_ROW_ID) AS RN
                 FROM RAW_table
                )
           WHERE RN > 1
          )

这种相关性可能不是您想要的,也是我建议列名始终是合格的(即包含表别名)的一个原因。

答案 1 :(得分:2)

这就是为什么别名确实非常有用。

在查询中

UPDATE RAW_table
   SET ERROR_LEVEL   = 4
 WHERE (UPPER (ATTRIBUTE_1), ROWID) 
       IN (SELECT UPPER (ATTRIBUTE_1), rid
           FROM (SELECT UPPER (ATTRIBUTE_1), ROWID AS rid
                     , ROW_NUMBER() OVER ( PARTITION BY UPPER (ATTRIBUTE_1) 
                                               ORDER BY RAW_DATA_ROW_ID) AS RN
                  FROM RAW_table
                )
           WHERE RN > 1
          )

SELECT UPPER (ATTRIBUTE_1)有效,因为它可以解析为对您正在更新的表的引用而不是FROM中的表。对于别名,该查询等同于

UPDATE RAW_table dest
   SET dest.ERROR_LEVEL   = 4
 WHERE (UPPER (dest.ATTRIBUTE_1), ROWID) 
       IN (SELECT UPPER (dest.ATTRIBUTE_1), src.rid
           FROM (SELECT UPPER (rt.ATTRIBUTE_1), rt.ROWID AS rid
                     , ROW_NUMBER() OVER ( PARTITION BY UPPER (rt.ATTRIBUTE_1) 
                                               ORDER BY rt.RAW_DATA_ROW_ID) AS RN
                  FROM RAW_table rt
                ) src
           WHERE src.rid > 1
          )

当然,如果您以这种方式编写,则会立即明确您引用dest.attribute_1而不是src.attribute_1。这(以及许多其他原因)是为什么给列添加别名是个好主意 - 它清楚地说明了您想要引用哪个对象,并在预期引用无效时抛出错误,而不是将其解析为您没有的内容打算。

答案 2 :(得分:0)

Patient::where('id', $patientId)->update(['doctor_id' => $result->doctor_id]);

答案 3 :(得分:0)

也许这些版本更快(至少它们更紧凑):

UPDATE RAW_table
SET ERROR_LEVEL = 4
WHERE ROWID <>ALL (SELECT MIN(ROWID) FROM RAW_table GROUP BY UPPER(ATTRIBUTE_1));


UPDATE RAW_table
SET ERROR_LEVEL = 4
WHERE ROWID <>ALL (SELECT FIRST_VALUE(ROWID) OVER (PARTITION BY UPPER(ATTRIBUTE_1) ORDER BY RAW_DATA_ROW_ID) FROM RAW_table);

注意,<>ALL相当于NOT IN - 我个人倾向于使用<>ALL