使用部分计数连接多个表

时间:2013-06-03 17:13:35

标签: sql postgresql join aggregate-functions

每个company都有产品,每个product都有detail1detail2detail3表中的条目。

**Table Company**
cid |   cname   
-----+-----------
100 | Company 1
101 | Company 2

**Table Product**
pid  | cid |   dname   
------+-----+-----------
1000 | 100 | Product A
2000 | 101 | Product B

**Table detail1**
pid  | state |          datetime          
------+-------+----------------------------
1000 | A     | 2013-06-03 11:49:49.224992
1000 | B     | 2013-06-03 11:49:49.226124
1000 | B     | 2013-06-03 11:49:49.228573
1000 | B     | 2013-06-03 11:49:49.23136
1000 | A     | 2013-06-03 11:49:49.233897
2000 | A     | 2013-06-03 11:49:49.243572
2000 | B     | 2013-06-03 11:49:49.245899

**Table detail2**
pid  | type |          datetime          
------+------+----------------------------
1000 | T1   | 2013-06-03 11:49:49.257978
1000 | T1   | 2013-06-03 11:49:49.258865
1000 | T1   | 2013-06-03 11:49:49.261212
1000 | T1   | 2013-06-03 11:49:49.263515
2000 | T1   | 2013-06-03 11:49:49.270654

**Table detail3**
pid  | quality |          datetime          
------+---------+----------------------------
1000 | Q1      | 2013-06-03 11:49:49.280894
1000 | Q1      | 2013-06-03 11:49:49.281786
1000 | Q1      | 2013-06-03 11:49:49.284011
2000 | Q1      | 2013-06-03 11:49:49.287797
2000 | Q1      | 2013-06-03 11:49:49.288629
2000 | Q1      | 2013-06-03 11:49:49.289587

我正在寻找一个返回数据的查询,如下所示:

CompanyID  CompanyName  detail1.StateA  detail1.stateB  count(detail2) count(detail3)
---------- ------------ --------------- --------------- -------------- ---------------
100        Company 1         2               3                4             3
101        Company 2         1               1                1             2 

我可能会根据datetime约束进一步限制结果。

1 个答案:

答案 0 :(得分:2)

SELECT c.cid
      ,c.cname
      ,sum(d1.d1_a_ct) AS d1_a_ct
      ,sum(d1.d1_b_ct) AS d1_b_ct
      ,sum(d2.d2_ct)   AS d2_ct
      ,sum(d3.d3_ct)   AS d3_ct
FROM   company c
LEFT   JOIN product p USING (cid)
LEFT   JOIN (
   SELECT pid, count(state = 'A' OR NULL) AS d1_a_ct
              ,count(state = 'B' OR NULL) AS d1_b_ct
   FROM   detail1
   -- WHERE datetime >= '2013-06-03 11:45:00'
   -- AND   datetime <  '2013-06-05 15:00:00'
   GROUP  BY pid
   ) d1   USING (pid)
LEFT   JOIN (
   SELECT pid, count(*) AS d2_ct
   FROM   detail2
   GROUP  BY pid
   ) d2   USING (pid)
LEFT   JOIN (
   SELECT pid, count(*) AS d3_ct
   FROM   detail3
   GROUP  BY pid
   ) d3   USING (pid);
GROUP BY  c.cid, c.cname;

在这种情况下避免“代理交叉连接”很重要。 如果您连接到多个n表(detail1,detail2,...)并且每个表可以有多个相关行,则行将相互相乘。
要避免此问题,请首先聚合详细信息表,以便每个产品只有 1 行。然后将所有这些同时加入相应的产品是没有问题的。

在此相关答案中有更多解释:
Two SQL LEFT JOINS produce incorrect result

我也使用LEFT JOIN,即使您写道“each产品中有条目...”。不能伤害。否则,如果其中一个详细信息表中没有相关的行,那么您将从结果中丢失整个公司。

我对产品做了同样的事情,所以你甚至可以得到没有任何产品的公司。

以下是count(state = 'A' OR NULL)部分计数如何工作的说明:
Compute percents from SUM() in the same SELECT sql query

进一步限制datetime列很简单。我添加了一个注释WHERE子句。请注意使用>=<来避免a common mistake with timestamp ranges