选择顶部,分组,并使用表变量对其他人进行求和

时间:2013-04-10 23:51:19

标签: sql sql-server sql-server-2012

我看了几个其他问题试图找到答案,但我不能。这就是事情,我有一个真正的大表,它将无限增长。当我说 BIG 时,我的意思是我有大约1000万行用于6小时数据的查询。我们有几个月的数据,所以你可以看到它有多大。

嗯,证明大小问题是合理的,我想做一个非常简单的查询:按列分组并将另一列的值相加。其中我想要最大的10个和,以及所有其他不在前10个的总和。我知道有这样做的方法,但我想这样做而不必计算两次总计表。为此,我使用了Table变量。我正在使用SQL SERVER 2012.

DECLARE @sumsTable TABLE(operationName varchar(200), operationAmount int)
DECLARE @topTable TABLE(operationName varchar(200), operationAmount int)
DECLARE @startTime DATETIME
DECLARE @endTime DATETIME
DECLARE @top INTEGER

SET @top = 10
SET @endTime = '03/11/2013'
SET @startTime = '03/10/2013'

--grouping by operationName and summing occurences
INSERT INTO @sumsTable
SELECT operationName, COUNT(*) AS operationAmount
FROM [f6f87bf0-33ab-4882-8674-2cb31e5e49c4]
WHERE (TIMESTAMP >= @startTime) AND (TIMESTAMP <= @endTime)
GROUP BY operationName

--selecting top ocurrences
INSERT INTO @topTable
SELECT TOP(@top) * FROM @sumsTable
ORDER BY operationAmount DESC

--Summing others and making union with top
SELECT 'OTHER' AS operationName, SUM(operationAmount) as operationAmount FROM @sumsTable
WHERE operationName NOT IN (SELECT operationName FROM @topTable)
UNION
SELECT * FROM @topTable
ORDER BY operationAmount DESC

我的问题是适合这是一个很好的方法,如果有更好的方法,更快的方式......我犯了什么罪?我可以摆脱表变量,而不是将所有的求和更多一次吗?

2 个答案:

答案 0 :(得分:2)

您可以在没有临时表的情况下执行此操作:

SET @top = 10
SET @endTime = '03/11/2013'
SET @startTime = '03/10/2013'

select 
      (case when y.RowID > @top then 'OTHER' else y.operationName end) as operationName,
      sum(y.operationAmount) as operationAmount
from
(
    select 
           row_number() over(order by count(*) desc) as RowID, 
           x.operationName, 
           count(*) AS operationAmount
    from [f6f87bf0-33ab-4882-8674-2cb31e5e49c4] as x
    where (TIMESTAMP >= @startTime) AND (TIMESTAMP <= @endTime)
    group by x.operationName
)
as y
group by (case when y.RowID > @top then 'OTHER' else y.operationName end)

答案 1 :(得分:0)

使用以下sql,您只需要聚合原始表一次

而不是

row_number() over(order by count(*) desc) as RowID, x.operationName, count(*) AS operationAmount

会计算(*)两次

DECLARE @startTime DATETIME
DECLARE @endTime DATETIME
DECLARE @top INTEGER

SET @endTime = '03/11/2013'
SET @startTime = '03/10/2013'

;WITH cte AS  ( -- get sum for all operations
    SELECT operationName, COUNT(*) AS operationAmount
    FROM [f6f87bf0-33ab-4882-8674-2cb31e5e49c4]
    WHERE (TIMESTAMP >= @startTime) AND (TIMESTAMP <= @endTime)
    GROUP BY operationName
),
cte1 AS ( -- rank totals
    SELECT operationName, operationAmount, ROW_NUMBER()OVER (ORDER BY operationAmount DESC) AS RN  
    FROM cte
) -- get top 10 and others
SELECT (CASE WHEN RN < 10 THEN operationName ELSE 'Others' END) Name, SUM(operationAmount) 
FROM cte1
GROUP BY (CASE WHEN RN < 10 THEN operationName ELSE 'Others' END)