分组+加入

时间:2011-06-17 15:22:25

标签: sql group-by left-join

您好我在使用Group By时遇到问题并在3个表之间加入。

我有一个包含各种字段和项目代码字段的项目表。然后我有一个发票表和一个小时表,每个项目可以有多个行。这两个表都有项目代码。

两个SUM值无法正确计算,我真的很想知道问题出在哪里。

这是我正在使用的sql:

SELECT  dbo.project.projectcode, 
        dbo.project.client, 
        dbo.project.project, 
        dbo.project.budget, 
        dbo.project.budget * 80 AS value, 
        SUM(dbo.harvest.hours) AS hourslogged, 
        SUM(dbo.salesforce.value) AS invoiced
FROM  dbo.salesforce 
    RIGHT OUTER JOIN dbo.project 
        ON dbo.salesforce.projectcode = dbo.project.projectcode 
    LEFT OUTER JOIN dbo.harvest 
        ON dbo.project.projectcode = dbo.harvest.projectcode
GROUP BY    dbo.project.projectcode, 
            dbo.salesforce.projectcode, 
            dbo.harvest.projectcode, 
            dbo.project.project, 
            dbo.project.client, 
            dbo.project.budget

对此有任何帮助或提示将非常感谢!

2 个答案:

答案 0 :(得分:1)

每当dbo.salesforcedbo.harvest两个表格中的每一个(projectcodeA每个B匹配多个匹配项时,就会发生迷你 - Cartesian product。这是一个简单的例子。假设有AAID AVALUE --- ------- 1 ValueA1 2 ValueA2 表,如下所示:

  • B

    BID  BVALUE   AID
    ---  -------  ---
    1    ValueB1  1
    2    ValueB2  1
    3    ValueB3  2
    
  • SELECT * FROM A JOIN B ON A.AID = B.AID

    AID  AVALUE   BID  BVALUE   AID
    ---  -------  ---  -------  ---
    1    ValueA1  1    ValueB1  1
    1    ValueA1  2    ValueB2  1
    2    ValueA2  3    ValueB3  2
    

现在,如果我们执行此连接:

C

结果将是:

CID  CVALUE   AID
---  -------  ---
1    ValueC1  1
2    ValueC2  1
3    ValueC3  1

输入表格SELECT * FROM A JOIN B ON A.AID = B.AID JOIN C ON A.AID = C.AID

AID  AVALUE   BID  BVALUE   AID  CID  CVALUE   AID
---  -------  ---  -------  ---  ---  -------  ---
1    ValueA1  1    ValueB1  1    1    ValueC1  1
1    ValueA1  1    ValueB1  1    2    ValueC2  1
1    ValueA1  1    ValueB1  1    3    ValueC3  1
1    ValueA1  2    ValueB2  1    1    ValueC3  1
1    ValueA1  2    ValueB2  1    2    ValueC3  1
1    ValueA1  2    ValueB2  1    3    ValueC3  1

现在加入是:

B

结果会是什么?这里:

C

正如您所看到的,C的每个匹配都会重复三次,因为B匹配了多少次。同样,来自A的每个匹配都会重复两次,因为这是SELECT p.projectcode, p.client, p.project, p.budget, p.budget * 80 AS value, h.hourslogged, s.invoiced FROM dbo.project p LEFT JOIN ( SELECT projectcode, SUM(dbo.salesforce.value) AS invoiced FROM dbo.salesforce GROUP BY projectcode ) s ON p.projectcode = s.projectcode LEFT JOIN ( SELECT projectcode, SUM(dbo.harvest.hours) AS hourslogged FROM dbo.harvest GROUP BY projectcode ) h ON p.projectcode = h.projectcode 中有多少匹配。当然,“最幸运的”是来自{{1}}的行,因为它重复2×3 = 6次。这是笛卡尔式的加入。这就是你的情况所发生的事情。

不确定它是否被认为是典型的,但在这种情况下,我经常会通过连接表达式分别对每个表进行分组,然后加入结果集。您的查询将如下所示:

{{1}}

答案 1 :(得分:0)

我建议避免混合左右外连接。 您的中心表是Project,因此请先使用它。

SELECT  dbo.project.projectcode, 
        dbo.project.client, 
        dbo.project.project, 
        dbo.project.budget, 
        dbo.project.budget * 80 AS value, 
        SUM(dbo.harvest.hours) AS hourslogged, 
        SUM(dbo.salesforce.value) AS invoiced
FROM    dbo.project      
            LEFT OUTER JOIN dbo.salesforce
                ON dbo.salesforce.projectcode = dbo.project.projectcode 
            LEFT OUTER JOIN dbo.harvest 
                ON dbo.project.projectcode = dbo.harvest.projectcode
GROUP BY    dbo.project.projectcode, 
            dbo.project.project, 
            dbo.project.client, 
            dbo.project.budget

但错误来自GROUP BY。您不必按照您正在进行聚合的两个表进行分组,否则您的聚合将不会很好!