根据另一列的不同值聚合一列

时间:2010-12-30 16:30:48

标签: sql sql-server sql-server-2008

此方案基于another question中的架构,我对任何有关架构有效性的讨论都不感兴趣!

我很想知道SQL Server中是否有任何好的技术可以根据另一列(amount1)的不同值执行一列(下面id1)的聚合。 / p>

下面的Plan1扫描table1两次,通过p_id执行两次聚合,然后将结果连接在一起。似乎可以改进。在某些情况下,查询2可能会返回错误的结果,无论如何计划都会更糟!

有什么想法吗?

DDL

IF OBJECT_ID('tempdb..#table1') IS NOT NULL DROP TABLE #table1;
IF OBJECT_ID('tempdb..#table2') IS NOT NULL DROP TABLE #table2;

CREATE TABLE #table1 (id1 int primary key nonclustered, amount1 int, p_id int);
CREATE CLUSTERED INDEX ix ON #table1 (p_id,id1);
INSERT INTO #table1
SELECT 1,500,10 UNION ALL
SELECT 2,700,20 UNION ALL
SELECT 3,500,10 UNION ALL
SELECT 4,450,20 UNION ALL
SELECT 5,300,10;

CREATE TABLE #table2 (id2 int primary key, amount2 int, id1 int);
INSERT INTO #table2
SELECT 1,300,1 UNION ALL
SELECT 2,200,1 UNION ALL
SELECT 3,200,2 UNION ALL
SELECT 4,500,2 UNION ALL
SELECT 5,400,3 UNION ALL
SELECT 6,150,4 UNION ALL
SELECT 7,300,4 UNION ALL
SELECT 8,300,5;

查询1

WITH t1
     AS (SELECT p_id,SUM(amount1) AS total1
         FROM   #table1
         GROUP  BY p_id),
     t2
     AS (SELECT p_id,SUM(amount2) AS total2
         FROM   #table2 table2
                JOIN #table1 table1
                  ON table1.id1 = table2.id1
         GROUP  BY p_id)
SELECT t1.p_id,total1,total2
FROM   t1
       JOIN t2
         ON t1.p_id = t2.p_id  

计划1

Execution Plan 1

查询2

SELECT table1.p_id, 
       FLOOR(SUM(DISTINCT amount1 + table1.id1/100000000.0)) AS total1, 
       SUM(amount2) AS total2
FROM #table1 table1 JOIN #table2 table2 ON table1.id1=table2.id1
GROUP BY table1.p_id

计划2

Execution Plan 1

2 个答案:

答案 0 :(得分:2)

这个只扫描一个表中的每个记录一次:

SELECT  p_id, SUM(amount1) AS total1, SUM(s_amount2) AS total2
FROM    #table1 t1
CROSS APPLY
        (
        SELECT  SUM(amount2) AS s_amount2
        FROM    #table2 t2
        WHERE   t2.id1 = t1.id1
        ) t2
GROUP BY
        p_id

  |--Compute Scalar(DEFINE:([Expr1006]=CASE WHEN [Expr1026]=(0) THEN NULL ELSE [Expr1027] END, [Expr1007]=CASE WHEN [Expr1028]=(0) THEN NULL ELSE [Expr1029] END))
       |--Stream Aggregate(GROUP BY:([t1].[p_id]) DEFINE:([Expr1026]=COUNT_BIG([tempdb].[dbo].[#table1].[amount1] as [t1].[amount1]), [Expr1027]=SUM([tempdb].[dbo].[#table1].[amount1] as [t1].[amount1]), [Expr1028]=COUNT_BIG([Expr1005]), [Expr1029]=SUM([Expr1005])))
            |--Nested Loops(Left Outer Join, OUTER REFERENCES:([t1].[id1]))
                 |--Clustered Index Scan(OBJECT:([tempdb].[dbo].[#table1] AS [t1]), ORDERED FORWARD)
                 |--Compute Scalar(DEFINE:([Expr1005]=CASE WHEN [Expr1024]=(0) THEN NULL ELSE [Expr1025] END))
                      |--Stream Aggregate(DEFINE:([Expr1024]=COUNT_BIG([tempdb].[dbo].[#table2].[amount2] as [t2].[amount2]), [Expr1025]=SUM([tempdb].[dbo].[#table2].[amount2] as [t2].[amount2])))
                           |--Clustered Index Scan(OBJECT:([tempdb].[dbo].[#table2] AS [t2]), WHERE:([tempdb].[dbo].[#table2].[id1] as [t2].[id1]=[tempdb].[dbo].[#table1].[id1] as [t1].[id1]))

,虽然它不一定更有效率。

这一个:

SELECT  p_id, SUM(amount1) AS total1, SUM(s_amount2) AS total2
FROM    #table1 t1
JOIN    (
        SELECT  id1, SUM(amount2) AS s_amount2
        FROM    #table2
        GROUP BY
                id1
        ) t2
ON      t2.id1 = t1.id1
GROUP BY
        p_id

将为联接提供更多选项,但是,如果将选择t2,则可在计划中使用额外的假脱机。

答案 1 :(得分:2)

嗯,@ Quassnoi解决方案似乎相当不错。在任何情况下,对于SQL Server 2005+,您可以使用PARTITION BY子句尝试进行更简单的查询,但执行计划并不是更好,尽管它并不一定意味着效率更高或更低。 / p>

SELECT A.p_id, MIN(amount1) total1, SUM(amount2) total2
FROM (SELECT p_id, id1, SUM(amount1) OVER(PARTITION BY p_id) amount1 FROM #table1) A
JOIN #table2 B
ON A.id1 = B.id1
GROUP BY A.p_id