通过各种属性比较两个大表-PostgreSQL

时间:2019-03-24 08:58:31

标签: sql postgresql

我很难提出一个有效的查询,该查询将具有不同属性的两个表进行比较。这是给有数十万个SKU可供销售的在线零售商的报告。每个SKU是“父”产品的变体。他们在各个市场上销售商品,因此需要查看是否有一些商品无法在各个地方出售。

有一个包含所有父产品的表,另一个是包含所有变体及其对应的SKU的表。在第三个表格中,他们具有每个sku(变体)的完整列表,并且是sku +市场组合唯一的相应市场。

数据库使用PostgreSQL

表结构如下:

产品表:

Products
id |  parent_sku  |  vendor_id
-------------------------------
 1 |     ABC      |     100
 2 |     DEF      |     200
 3 |     XYZ      |     100

变化表:

Variations
id |  parent_id  |   sku
----------------------------
 1 |     1       |   ABC-1
 2 |     1       |   ABC-2
 3 |     1       |   ABC-3
 4 |     2       |   DEF-1
 5 |     2       |   DEF-2
 6 |     3       |   XYZ-1
 7 |     3       |   XYZ-2

市场表:

MarketplaceData
 id |   sku   |   marketplace  | price
----------------------------
 1  |  ABC-1  |     website1   | 99.99
 2  |  ABC-2  |     website1   | 99.99
 3  |  ABC-3  |     website1   | 89.99
 4  |  DEF-1  |     website1   | 29.99
 5  |  DEF-2  |     website1   | 29.99
 6  |  XYZ-1  |     website1   | 39.99
 7  |  XYZ-2  |     website1   | 39.99
 8  |  ABC-1  |     website2   | 99.99
 9  |  ABC-2  |     website2   | 99.99
 10 |  ABC-3  |     website2   | 99.99
 11 |  DEF-1  |     website2   | 29.99
 12 |  DEF-2  |     website2   | 29.99
 13 |  XYZ-1  |     website2   | 34.99
 14 |  XYZ-2  |     website2   | 34.99

我有一个有效的查询,但是执行时间非常长,而且非常费力。

SELECT DISTINCT parent_id FROM Variations 
WHERE sku IN (SELECT sku FROM MarketplaceData WHERE marketplace IN ('website1','website2')) 
AND sku NOT IN (SELECT sku FROM MarketplaceData WHERE marketplace IN ('website3','website4')) 
LIMIT 20 OFFSET 0 

由于每个sku +市场数据集都有近40万行,而MarketplaceData表包含超过200万行,因此该查询将永远执行。

就索引而言,id列是每个索引的主键。 Variations表在sku上有一个索引(必须是唯一的),而MarketplaceData在sku + marketplace上有索引。

最终,我需要的是符合条件的唯一parent_id的列表。

任何帮助或指导将不胜感激。

谢谢!

3 个答案:

答案 0 :(得分:1)

代替IN和NOT IN可以使用INNER JOIN和LEFT JOIN来检查null

SELECT DISTINCT v.parent_id 
FROM Variations v
INNER JOIN (
 SELECT sku FROM MarketplaceData WHERE marketplace IN ('website1','website2')
) t1 on t1.sku = v.sku 
LEFT JOIN (
    SELECT sku FROM MarketplaceData WHERE marketplace IN ('website3','website4')
) t2 On  t2.sku = v.sku 
WHERE t2.sku is null

答案 1 :(得分:0)

为什么只使用一个子查询?

SELECT DISTINCT parent_id 
FROM Variations 
WHERE sku IN (SELECT sku FROM MarketplaceData WHERE marketplace IN ('website1','website2')
              except
              SELECT sku FROM MarketplaceData WHERE marketplace IN ('website3','website4'))
LIMIT 20 OFFSET 0 

答案 2 :(得分:0)

如何通过简单的聚合来获得skus?

select mpd.sku
from MarketplaceData mpd
where mpd.marketplace in ('website1', 'website2', 'website3', 'website4')
group by mpd.sku
having count(*) filter (where mpd.marketplace in ('website1', 'website2')) > 0 and
       count(*) filter (where mpd.marketplace in ('website3', 'website4')) = 0;

然后获取父ID:

select distinct v.parent_id
from variations v join
     (select mpd.sku
      from MarketplaceData mpd
      where mpd.marketplace in ('website1', 'website2', 'website3', 'website4')
      group by mpd.sku
      having count(*) filter (where mpd.marketplace in ('website1', 'website2')) > 0 and
             count(*) filter (where mpd.marketplace in ('website3', 'website4')) = 0
     ) m
     on m.sku = v.sku;