Question

我正在编写一个查询来汇总Postgres数据库中的数据：

SET

基本上我们有一个带有选择的产品，这些选项包含列表，列表中有SELECT products.id, products.NAME, product_types.type_name AS product_type, delivery_types.delivery, products.required_selections, Count(s.id) AS selections_count, Sum(CASE WHEN ss.status = 'WARNING' THEN 1 ELSE 0 END) AS warning_count FROM products JOIN product_types ON product_types.id = products.product_type_id JOIN delivery_types ON delivery_types.id = products.delivery_type_id LEFT JOIN selections_products sp ON products.id = sp.product_id LEFT JOIN selections s ON s.id = sp.selection_id LEFT JOIN selection_statuses ss ON ss.id = s.selection_status_id LEFT JOIN listings l ON ( s.listing_id = l.id AND l.local_date_time BETWEEN To_timestamp('2014/12/01', 'YYYY/mm/DD' ) AND To_timestamp('2014/12/30', 'YYYY/mm/DD') ) GROUP BY products.id, product_types.type_name, delivery_types.delivery。我需要一份所有产品清单以及两个日期之间的清单数量。无论我做什么，我都会计算所有选项（总计）。我觉得我忽略了一些东西。同样的概念适用于local_date。另外，我真的不明白为什么Postgres要求我在这里添加warning_count。

架构看起来像这样（无论如何你会关心的部分）：

group by

Answer 1

无论LEFT JOIN如何，您都可以listings.local_date_time对所有选项进行选择。

有解释的余地，我们需要查看具有所有约束的实际表定义以及数据类型。走出困境，我有根据的猜测是，您可以使用FROM子句中的括号来修复查询，以确定连接的优先级：

SELECT p.id
     , p.name
     , pt.type_name AS product_type
     , dt.delivery
     , p.required_selections
     , count(s.id) AS selections_count
     , sum(CASE WHEN ss.status = 'WARNING' THEN 1 ELSE 0 END) AS warning_count
FROM   products       p
JOIN   product_types  pt ON pt.id = p.product_type_id
JOIN   delivery_types dt ON dt.id = p.delivery_type_id
LEFT   JOIN (  -- LEFT JOIN!
          selections_products sp
   JOIN   selections s  ON s.id  = sp.selection_id  -- INNER JOIN!
   JOIN   listings   l  ON l.id  = s.listing_id     -- INNER JOIN!
                       AND l.local_date_time >= '2014-12-01'
                       AND l.local_date_time <  '2014-12-31'
   LEFT   JOIN selection_statuses ss ON ss.id = s.selection_status_id
   ) ON sp.product_id = p.id
GROUP  BY p.id, pt.type_name, dt.delivery;

这样，您首先在[INNER] JOIN 之前LEFT JOIN消除给定时间范围之外的所有选择，从而保留所有产品在结果中，包括那些在任何适用选择中都没有的。

相关：

Join four tables involving LEFT JOIN without duplicates

在选择所有或大多数产品时，可以将其重写为更快：

SELECT p.id
     , p.name
     , pt.type_name AS product_type
     , dt.delivery
     , p.required_selections
     , COALESCE(s.selections_count, 0) AS selections_count
     , COALESCE(s.warning_count, 0)    AS warning_count
FROM   products       p
JOIN   product_types  pt ON pt.id = p.product_type_id
JOIN   delivery_types dt ON dt.id = p.delivery_type_id
LEFT   JOIN (
   SELECT sp.product_id
        , count(*) AS selections_count
        , count(*) FILTER (WHERE ss.status = 'WARNING') AS warning_count
   FROM   selections_products sp
   JOIN   selections          s  ON s.id  = sp.selection_id
   JOIN   listings            l  ON l.id  = s.listing_id
   LEFT   JOIN selection_statuses ss ON ss.id = s.selection_status_id
   WHERE  l.local_date_time >= '2014-12-01'
   AND    l.local_date_time <  '2014-12-31'
   GROUP  BY 1
   ) s ON s.product_id = p.id;

首先按product_id汇总和计算选择和警告，然后然后加入产品会更便宜。（除非您只检索少量产品，否则首先减少相关行的成本会更低。）

相关：

Why does the following join increase the query time significantly?

另外，我真的不明白为什么Postgres要求我在这里添加一个小组。

自Postgres 9.1以来，GROUP BY中的PK列涵盖了相同表的所有列。不涵盖其他表的列，即使它们在功能上依赖。如果您不想聚合它们，则需要在GROUP BY中明确列出这些内容。

我的第二个查询通过在加入之前聚合来避免这个问题。

除此之外：有可能，这并不是你想要的：

l.local_date_time BETWEEN To_timestamp('2014/12/01', 'YYYY/mm/DD') AND To_timestamp('2014/12/30', 'YYYY/mm/DD')

由于date_time似乎属于timestamp类型（不是timestamptz！），您将包含＆＃39; 2014-12-30 00： 00＆＃39;，但排除剩下的时间＆＃39; 2014-12-30＆＃39;。对于日期和时间戳，使用ISO 8601格式总是更好，这与每个区域设置和datestyle设置相同。因此：

WHERE l.local_date_time >= '2014-12-01' AND l.local_date_time < '2014-12-31'

这包括＆＃39; 2014-12-30＆＃39;的所有，而不包含任何内容。不知道为什么你选择排除＆＃39; 2014-12-31＆＃39;。也许你真的想要包括2014年12月的所有内容？

WHERE l.local_date_time >= '2014-12-01' AND l.local_date_time < '2015-01-01'

计算连接表的列数

1 个答案: