以下是我的问题的简化版本。我有两张桌子。每个表都有一个唯一的ID字段,但在这种情况下它无关紧要。
shipments
有3个字段:shipment_id
,receive_by_datetime
和qty
。
deliveries
有4个字段:delivery_id
,shipment_id
,delivered_on_datetime
和qty
。
在shipments
中,shipment_id
和receive_by_datetime
字段始终匹配。表中有很多行看起来是基于这两列的重复行(但它们不是......其他字段不同)。
在deliveries
中,shipment_id
与shipments
表匹配。还有许多行似乎是基于delivery_id
和delivered_on_datetime
字段的重复行(但它们不再存在......其他字段存在,我没有列出)。
我试图为每个聚合delivered_on_datetime
和receive_by_datetime
拉一行,但由于多对多关系,这很难。这些问题的某个地方的查询是否正确?
SELECT d.delivered_on_datetime, s.receive_by_datetime, SUM(d.qty) FROM deliveries d LEFT JOIN ( SELECT DISTINCT s1.shipment_id, s1.receive_by_datetime FROM shipments s1 ) s ON (s.shipment_id = d.shipment_id) GROUP BY d.delivered_on_datetime, s.receive_by_datetime
答案 0 :(得分:2)
如果总SUM(d.qty)
大于SELECT SUM(qty) FROM deliveries
这样的事情可能更适合你:
SELECT d.delivered_on_datetime, s.receive_by_datetime, SUM(d.qty) AS delivered_qty, SUM(d.qty) AS shipped_qty
FROM deliveries d
LEFT JOIN (
SELECT s1.shipment_id, s1.receive_by_datetime, SUM(s1.qty) AS qty
FROM shipments s1
GROUP BY s1.shipment_id, s1.received_by_datetime
) s ON (s.shipment_id = d.shipment_id)
GROUP BY d.delivered_on_datetime, s.receive_by_datetime
如果您以某种方式拥有(或可能拥有)具有shipment_id
的多个值的received_by_datetime
,并且最佳做法是假设其他内容可能会略微损坏数据,那么就要防止其中的行deliveries
表被复制,同时仍然返回您可以使用的有效结果:
SELECT d.delivered_on_datetime, s.receive_by_datetime, SUM(d.qty) AS delivered_qty, SUM(d.qty) AS shipped_qty
FROM deliveries d
LEFT JOIN (
SELECT s1.shipment_id, MAX(s1.receive_by_datetime) AS receive_by_datetime, SUM(s1.qty) AS qty
FROM shipments s1
GROUP BY s1.shipment_id
) s ON (s.shipment_id = d.shipment_id)
GROUP BY d.delivered_on_datetime, s.receive_by_datetime
答案 1 :(得分:1)
是的,多对多的问题是你得到行的笛卡尔积,所以你不止一次计算同一行。每隔一行匹配一次。
在货件中,shipment_id和receive_by_datetime字段始终匹配
如果这意味着不能有两个具有相同ID但日期不同的货件,那么您的查询将起作用。但总的来说它并不安全。即如果subselect distinct可以为每个货件ID返回多于一行,则您将受到重复计算问题的影响。一般来说,这是一个非常棘手的问题需要解决 - 实际上我认为这种数据模型无法实现。