添加limit子句时,insert-select获得更好的计划

时间:2019-01-17 07:58:14

标签: postgresql sql-execution-plan insert-select foreign-data-wrapper

这是我正在运行的服务器

select version();
                                                 version
---------------------------------------------------------------------------    
PostgreSQL 10.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit
(1 row)

我首先编写了select(ext.t_event和ext.t_event_data是oracle_fdw(1.1版)从远程oracle数据库中获取的两个外部表)

select 
  te.id_data, 
  te.id_device, 
  te.date_write, 
  te.date_event, 
  ted.i_inout, 
  ted.value
from ext.t_event te, ext.t_event_data ted 
where te.id_device =2749651 
  and te.date_event >= '2019-01-16'and te.date_event < '2019-01-17' 
  and te.id_data=ted.id_data;

花费大约10秒的时间来获取整个记录集(3600条记录)。

但是后来我把选择变成了插入选择

insert into stg_data
select 
  te.id_data, 
  te.id_device, 
  te.date_write, 
  te.date_event, 
  ted.i_inout, 
  ted.value
from ext.t_event te, ext.t_event_data ted 
where te.id_device =2749651 
  and te.date_event >= '2019-01-16'and te.date_event < '2019-01-17' 
  and te.id_data=ted.id_data;

我被迫终止该查询,它已经运行了30分钟以上!

经过数小时的奋斗和绝望的尝试,我决定尝试这个

insert into stg_data
select 
  te.id_data, 
  te.id_device, 
  te.date_write, 
  te.date_event, 
  ted.i_inout, 
  ted.value
from ext.t_event te, ext.t_event_data ted 
where te.id_device =2749651 
  and te.date_event >= '2019-01-16'and te.date_event < '2019-01-17' 
  and te.id_data=ted.id_data
  limit 5000;

然后...我在20秒内惊讶地将整个记录集存储在stg_data中。

为了更好地理解差异,我决定分析计划。

选择无限制

 Foreign Scan  (cost=10000.00..20000.00 rows=1000 width=548)
   Oracle query: SELECT /*eb01c463a72c3b6350f86f5db25e1353*/ r1."ID_DATA",
   r1."ID_DEVICE", r1."DATE_WRITE", r1."DATE_EVENT", r2."I_INOUT",
   r2."VALUE" FROM ("DISPATCH"."T_EVENT" r1 INNER JOIN
   "DISPATCH"."T_EVENT_DATA" r2 ON (r1."ID_DATA" = r2."ID_DATA") AND
  (r1."DATE_EVENT" >= (CAST ('2019-01-16 00:00:00.000000 AD' AS
  TIMESTAMP))) AND (r1."DATE_EVENT" < 
  (CAST ('2019-01-17 00:00:00.000000 AD' AS TIMESTAMP))) 
  AND (r1."ID_DEVICE" = 2749651))

带限制选择

 Limit  (cost=10000.00..20000.00 rows=1000 width=548)
   ->  Foreign Scan  (cost=10000.00..20000.00 rows=1000 width=548)
      Oracle query: SELECT /*eb01c463a72c3b6350f86f5db25e1353*/
      r1."ID_DATA", r1."ID_DEVICE", r1."DATE_WRITE", r1."DATE_EVENT", 
      r2."I_INOUT", r2."VALUE" FROM ("DISPATCH"."T_EVENT" r1 INNER 
      JOIN "DISPATCH"."T_EVENT_DATA" r2 ON (r1."ID_DATA" = r2."ID_DATA")
      AND (r1."DATE_EVENT" >= (CAST ('2019-01-16 00:00:00.000000 AD' AS 
      TIMESTAMP))) AND (r1."DATE_EVENT" < (CAST ('2019-01-17
      00:00:00.000000 AD' AS TIMESTAMP))) AND (r1."ID_DEVICE" = 2749651))

因此,它基本上将相同的查询发送到Oracle,并在提取完成后立即在本地应用FILTER。

INSER-SELECT计划看起来一样吗?不行!

INSERT_SELECT LIMIT

Insert on stg_data_hist  (cost=10000.00..20010.00 rows=1000 width=548)
   ->  Limit  (cost=10000.00..20000.00 rows=1000 width=548)
         ->  Foreign Scan  (cost=10000.00..20000.00 rows=1000 width=548)
               Oracle query: SELECT /*eb01c463a72c3b6350f86f5db25e1353*/ 
               r1."ID_DATA", r1."ID_DEVICE", r1."DATE_WRITE", 
               r1."DATE_EVENT", r2."I_INOUT", r2."VALUE" FROM 
               ("DISPATCH"."T_EVENT" r1 INNER JOIN 
               "DISPATCH"."T_EVENT_DATA" r2 ON (r1."ID_DATA" = 
               r2."ID_DATA") AND (r1."DATE_EVENT" >= (CAST ('2019-01-16 
               00:00:00.000000 AD' AS TIMESTAMP))) AND (r1."DATE_EVENT" < 
               (CAST('2019-01-17 00:00:00.000000 AD' AS TIMESTAMP))) AND 
               (r1."ID_DEVICE" = 2749651))

INSERT-SELECT no LIMIT子句

Insert on stg_data_hist  (cost=30012.50..40190.00 rows=5000 width=548)
 ->  Hash Join  (cost=30012.50..40190.00 rows=5000 width=548)
       Hash Cond: (te.id_data = ted.id_data)
     ->  Foreign Scan on t_event te  (cost=10000.00..20000.00 rows=1000 width=28)
           Oracle query: SELECT /*93379c271b3f1bc08a1dbb94fb89f739*/ 
           r3."ID_DATA", r3."ID_DEVICE", r3."DATE_WRITE", r3."DATE_EVENT" 
           FROM "DISPATCH"."T_EVENT" r3 WHERE (r3."DATE_EVENT" >= 
           (CAST ('2019-01-16 00:00:00.000000 AD' AS TIMESTAMP))) AND 
           (r3."DATE_EVENT" < (CAST ('2019-01-17 00:00:00.000000 AD' AS 
           TIMESTAMP))) AND (r3."ID_DEVICE" = 2749651)
       ->  Hash  (cost=20000.00..20000.00 rows=1000 width=528)
           ->  Foreign Scan on t_event_data ted  
                  (cost=10000.00..20000.00 rows=1000 width=528)
                 Oracle query: SELECT /*21c8741f2fa8a8d13d037c3191e8ac96*/ 
                    r4."ID_DATA", r4."I_INOUT", r4."VALUE" FROM 
                    "DISPATCH"."T_EVENT_DATA" r4

这解释了为什么花时间比另一个更长的时间。它从一个外部表中检索日期过滤的记录,从第二个外部表中检索完整的记录,并在本地进行联接。这将需要很长时间!这是几百万条记录,而几千条记录。

最后是我的两个问题

1)我想制定第一个计划,但要摆脱LIMIT子句(发送会使我的脊椎颤抖:-))。你会怎么做?除了join子句外,我无意对ext.t_event_data应用过滤器。

2)为什么即使两个SELECT计划看起来很相似,两个INSERT-SELECT计划看起来也是如此?

感谢阅读,祝您愉快

1 个答案:

答案 0 :(得分:0)

计划者似乎认为无论哪种方式都只会获得几千行,这显然是遥不可及的,请通过运行'ANALYZE ext.t_event'确保外部表的统计信息是最新的,而对于ext也是如此。 t_event_data因为:

https://github.com/laurenz/oracle_fdw

  

PostgreSQL不会使用autovacuum守护程序自动收集外部表的统计信息。

     

请记住,分析Oracle外表将导致完整的顺序表扫描。您可以使用表选项sample_percent通过仅使用Oracle表的示例来加快此过程。

在选择情况下,如果使用限制,则在插入情况下将联接 下推到Oracle,因此,我看到在没有限制的情况下不使用插入操作的唯一原因是缺少精确的表格统计信息。您可以尝试将插入查询重写为CTE(出于明显的原因未测试此查询):

WITH foreign_data AS (
select 
  te.id_data, 
  te.id_device, 
  te.date_write, 
  te.date_event, 
  ted.i_inout, 
  ted.value
from ext.t_event te, ext.t_event_data ted 
where te.id_device =2749651 
  and te.date_event >= '2019-01-16'and te.date_event < '2019-01-17' 
  and te.id_data=ted.id_data
)

insert into stg_data from foreign_data

您还可以尝试将查询重写为显式内部联接,而不是在where子句(te.id_data = ted.id_data)中具有联接条件。