Question

运行以下查询时，有时需要15秒，有时需要90分钟。是什么导致这种巨大差异？

INSERT INTO missing_products 
SELECT table_name, 
   product_id 
FROM   products 
WHERE  table_name = 'xxxxxxxxx' 
   AND product_id NOT IN (SELECT id 
                               FROM new_products);

我已经尝试了解释它，我唯一能看到的是关于新产品的index only scan。我也改写了这个查询以改为使用左连接，并插入右边为NULL的行，但这会导致时间问题。

我有以下表格，其结构如下所示。

产品

id bigint not null,
product_id text not null,
table_name text not null,
primary key (id),
unique index (product_id)

new_products

id text not null,
title text not null,
primary key, btree (id)

missing_products

table_name text not null,
product_id text not null,
primary key (table_name, product_id)

解释 - 这在where子句中有一个额外的字段，但应该给出一个好主意。时间花了22秒。

 Insert on missing_products  (cost=5184.80..82764.35 rows=207206 width=38) (actual time=22466.525..22466.525 rows=0 loops=1)
   ->  Seq Scan on products  (cost=5184.80..82764.35 rows=207206 width=38) (actual time=0.055..836.217 rows=411150 loops=1)
         Filter: ((active > (-30)) AND (NOT (hashed SubPlan 1)) AND (feed = 'xxxxxxxx'::text))
         Rows Removed by Filter: 77436
         SubPlan 1
           ->  Index Only Scan using new_products_pkey on new_products  (cost=0.39..5184.74 rows=23 width=10) (actual time=0.027..0.027 rows=0 loops=1)
                 Heap Fetches: 0
 Planning time: 0.220 ms
 Execution time: 22466.596 ms

Answer 1

显然在查看EXPLAIN ANALYZE的输出时，SELECT几乎不需要800毫秒，大部分时间， 22秒花费在INSERTING行。

此外，您的new_products表的统计数据似乎不准确，因为它预测了23行而实际行只有0，现在计划看起来很难，可能是灾难性的，具体取决于{{1在整个应用程序中使用表格，如果自动分析没有开始，我会定期new_products定期表格，并监控一天内的表现

Answer 2

我会尝试两件事：

尝试在products.table_name上添加一个您目前似乎没有的索引。
尝试重写查询以使用not exists子句而不是not in。有时，数据库可以通过以下方式更有效地执行查询：

使用not exists查询：

INSERT INTO missing_products (table_name, product_id)
SELECT p.table_name, p.product_id 
  FROM products p
 WHERE p.table_name = 'xxxxxxxxx' 
   AND NOT EXISTS (SELECT null
                     FROM new_products n
                    WHERE n.id = p.product_id)

SQL查询在postgres中间歇性地减慢

2 个答案: