SQL Server 2012似乎引入了CUME_DIST()和PERCENT_RANK,它们用于计算列的累积分布。在SQL Server 2008中是否有相同的功能来实现这一目标?
答案 0 :(得分:3)
永远不要在SQL中说永远。
声明:
select percent_rank() over (partition by <x> order by <y>)
基本上相当于:
select row_number() over (partition by <x> order by <y>) / count(*) over (partition by <x>)
基本上意味着它在数据中没有重复项时有效。即使存在重复,它也应该足够接近。
“真正的”答案是它等同于:
select row_number() over (partition by <x> order by <y>) / count(distinct <y>) over (partition by <x>)
但是,我们没有计数(不同)作为一个函数。而且,除非你真的需要,否则在2008年表达是痛苦的。
函数cume_dist()更难,因为它需要一个累积总和,你需要一个自联接。假设没有重复的近似值:
with t as (select <x>, <y>,
row_number() over (partition by <x> order by <y>) as seqnum
from <table>
)
select t.*, sumy*1.0 / sum(sumy) over (partition by <x>)
from (select t.*, sum(tprev.y) as sumy
from t left outer join
t tprev
on t.x = tprev.x and t.seqnum >= tprev.seqnum
) t
答案 1 :(得分:1)
2012年之前不存在等效函数,但一种可能的解决方法涉及递归CTE,至少对于数据集&lt; 32767行。在这里,一对骰子被抛出30次:
SET NOCOUNT ON;
DECLARE @t TABLE(i INT);
DECLARE @i INT=0;
WHILE @i<30 BEGIN
INSERT INTO @t VALUES (CAST(RAND()*6 AS INT)+1 + CAST(RAND()*6 AS INT)+1);
SET @i+=1;
END
DECLARE @tc INT; SELECT @tc=COUNT(*) FROM @t;
WITH a AS (
SELECT *
, d=CAST(COUNT(1)OVER(PARTITION BY i ORDER BY i) AS DECIMAL(5,2)) / @tc
, r=ROW_NUMBER()OVER(ORDER BY i)
, pr=CAST((RANK()OVER(ORDER BY i)-1)AS DECIMAL(5,2)) / (@tc - 1)
FROM @t
)
, rcte (i, d, r, cd, pr) AS (
SELECT i, d, r, d, pr
FROM a
WHERE r=1
UNION ALL
SELECT a.i, a.d, a.r
, CASE WHEN rcte.i<>a.i THEN CAST(rcte.cd+a.d AS DECIMAL(5,2)) ELSE rcte.cd END
, a.pr
FROM a
INNER JOIN rcte ON rcte.r + 1 = a.r
)
SELECT i,cd,pr FROM rcte
OPTION (MAXRECURSION 32767)
结果:
i cd pr
----------- --------------------------------------- ---------------------------------------
2 0.0333333333333 0.0000000000000
3 0.0700000000000 0.0344827586206
4 0.2400000000000 0.0689655172413
4 0.2400000000000 0.0689655172413
4 0.2400000000000 0.0689655172413
4 0.2400000000000 0.0689655172413
4 0.2400000000000 0.0689655172413
5 0.3100000000000 0.2413793103448
5 0.3100000000000 0.2413793103448
6 0.3800000000000 0.3103448275862
6 0.3800000000000 0.3103448275862
7 0.5100000000000 0.3793103448275
7 0.5100000000000 0.3793103448275
7 0.5100000000000 0.3793103448275
7 0.5100000000000 0.3793103448275
8 0.6100000000000 0.5172413793103
8 0.6100000000000 0.5172413793103
8 0.6100000000000 0.5172413793103
9 0.8400000000000 0.6206896551724
9 0.8400000000000 0.6206896551724
9 0.8400000000000 0.6206896551724
9 0.8400000000000 0.6206896551724
9 0.8400000000000 0.6206896551724
9 0.8400000000000 0.6206896551724
9 0.8400000000000 0.6206896551724
10 0.8700000000000 0.8620689655172
11 0.9700000000000 0.8965517241379
11 0.9700000000000 0.8965517241379
11 0.9700000000000 0.8965517241379
12 1.0000000000000 1.0000000000000
以下是与上述CTE相当的SQL 2012:
SELECT *
, cd=CUME_DIST()OVER(ORDER BY i)
, pr=PERCENT_RANK()OVER(ORDER BY i)
FROM @t;
答案 2 :(得分:0)
这非常接近。首先是一些样本数据:
USE tempdb;
GO
CREATE TABLE dbo.DartScores
(
TournamentID INT,
PlayerID INT,
Score INT
);
INSERT dbo.DartScores VALUES
(1, 1, 320),
(1, 2, 340),
(1, 3, 310),
(1, 4, 370),
(2, 1, 310),
(2, 2, 280),
(2, 3, 370),
(2, 4, 370);
现在,2012版的查询:
SELECT TournamentID, PlayerID, Score,
pr = PERCENT_RANK() OVER (PARTITION BY TournamentID ORDER BY Score),
cd = CUME_DIST() OVER (PARTITION BY TournamentID ORDER BY Score)
FROM dbo.DartScores
ORDER BY TournamentID, pr;
产生这个结果:
TournamentID PlayerID Score pr cd
1 3 310 0 0.25
1 1 320 0.333333333333333 0.5
1 2 340 0.666666666666667 0.75
1 4 370 1 1
2 2 280 0 0.25
2 1 310 0.333333333333333 0.5
2 3 370 0.666666666666667 1
2 4 370 0.666666666666667 1
2005年的等价物非常接近,但它并没有很好地处理关系。对不起,我今晚没气,否则我会帮忙弄清楚原因。我对Itzik's new High Performance window function book中学到的知识非常了解。
;WITH cte AS
(
SELECT TournamentID, PlayerID, Score,
rk = RANK() OVER (PARTITION BY TournamentID ORDER BY Score),
rn = COUNT(*) OVER (PARTITION BY TournamentID)
FROM dbo.DartScores
)
SELECT TournamentID, PlayerID, Score,
pr = 1e0*(rk-1)/(rn-1),
cd = 1e0*(SELECT COALESCE(MIN(cte2.rk)-1, cte.rn)
FROM cte AS cte2 WHERE cte2.rk > cte.rk) / rn
FROM cte;
产生此结果(注意cume_dist值如何稍微改变关系):
TournamentID PlayerID Score pr cd
1 3 310 0 0.25
1 1 320 0.333333333333333 0.5
1 2 340 0.666666666666667 0.75
1 4 370 1 1
2 2 280 0 0.25
2 1 310 0.333333333333333 0.5
2 3 370 0.666666666666667 0.75
2 4 370 0.666666666666667 0.75
别忘了清理:
DROP TABLE dbo.DartScores;
答案 3 :(得分:0)
是的,有一个简单的解决方案,至少对于percent_rank()部分。你可以使用
(rank() over (partition by <x> order by <y>)-1)/(count(*) over (partition by <x>)-1)
这将为您提供与
完全相同的结果percent_rank() over (partition by <x> order by <y>)
rank() - 函数是SQL Server 2008中已经存在的为数不多的分析函数之一。