Question

这个看似简单的查询（1个连接）需要花费数小时才能运行，即使该表包含不到150万行......

我有与Product项目存在一对多关系的RetailerProduct项，我希望查找其相关Product不包含的所有RetailerProducts项retailer_id=1。

的任何实例

Product中约有150万行，RetailerProduct中约有110万行retailer_id=1（RetailerProduct共有290万行）

型号：

class Product(models.Model):
    ...
    upc = models.CharField(max_length=96, unique=True)
    ...

class RetailerProduct(models.Model):
    ...
    product = models.ForeignKey('project.Product',
                                related_name='retailer_offerings',
                                on_delete=models.CASCADE,
                                null=True)
    ...

    class Meta:
        unique_together = (("retailer", "retailer_product_id", "retailer_sku"),)

查询：

Product.objects.exclude(
   retailer_offerings__retailer_id=1).values_list('upc', flat=True)

生成的SQL：

SELECT "project_product"."upc" FROM "project_product" 
 WHERE NOT ("project_product"."id" IN 
  (SELECT U1."product_id" AS Col1 FROM "project_retailerproduct" U1 
    WHERE (U1."retailer_id" = 1 AND U1."product_id" IS NOT NULL))
  )

运行该查询需要数小时。 psql shell中的EXPLAIN呈现：

 QUERY PLAN                                             
---------------------------------------------------------------------------------------------------
 Seq Scan on project_product  (cost=0.00..287784596160.17 rows=725892 width=13)
   Filter: (NOT (SubPlan 1))
   SubPlan 1
     ->  Materialize  (cost=0.00..393961.19 rows=998211 width=4)
           ->  Seq Scan on project_retailerproduct u1  (cost=0.00..385070.14 rows=998211 width=4)
                 Filter: ((product_id IS NOT NULL) AND (retailer_id = 1))
(6 rows)

我想发布EXPLAIN ANALYZE但它仍在运行。

为什么Seq Scan on project_product的费用如此之高？任何优化建议？

Answer 1

RetailerProduct中的110万行，零售商_id = 1（RetailerProduct共计290万行）

您正在从290万行中选择110万行。即使您在retailer_id上有索引，也不会在这里使用。你在看这里几乎一半的桌子。这将需要全表扫描。

然后让我们回想一下，WHERE NOT IN类型查询通常很慢。在您的情况下，您将product_id列与110万行进行比较。完成后，您实际上正在获取行，这些行可能达到数十万行。您可能想要考虑LIMIT，但即使这样，查询可能也不会快得多。

因此，这不是一个可以轻松优化的查询。您可能希望使用完全不同的查询。这是一个示例原始查询

SELECT "project_product"."upc" FROM "project_product" LEFT JOIN
 (SELECT product_id FROM "project_retailerproduct" 
  WHERE retailer_id = 1)
AS retailer 
ON project_product.id = retailer.product_id WHERE
WHERE retailer.product_id IS NULL

查询效率

1 个答案: