T-SQL Unequal Decile基于另一列中的总计

时间:2018-01-25 13:03:20

标签: sql tsql

我没有看到我正在寻找的确切答案。我有一个带有ID和两个值的表。我需要将第一个值列从低到高排序,然后根据具有相等(或几乎相等)总值2的每个十进制对列表进行十进制分解。这是一个使用四分位数进行空间考虑的示例:

我有:

ID  value1  value2      
1     2      132        
2     6      182        
3     5      195        
4     8      152        
5     3      132        
6     9      129        
7     3      180        
8     9      120        
9     3      172        
10    6      192        
11    9      177        
12    12     151        

每个四分位数应约为478.5

按值1排序得到这个,但我需要能够分配我的四分位数,每个大约是478.5。我手动输入了样本四分位数,根据计算结果可能正确也可能不正确

ID  value1  value2  Qtle    
1     2      132      1 
5     3      132      1 
7     3      180      1 
9     3      172      2 
3     5      195      2 
2     6      182      3 
10    6      192      3 
4     8      152      3 
6     9      129      4 
8     9      120      4 
11    9      177      4 
12   12      151      4 

抱歉格式化 - 首发帖子。

编辑1 - 我想我可能已经解决了它,虽然它可能没有那么优雅

编辑2 - 在上面添加了样本四分位数并修复了下面的代码以反映四分位数而不是十分位数。同时修正了value2的总和

SELECT value1
    ,value2
,SUM(value2) OVER (ORDER BY value1 ) CumSum
,CASE
    WHEN SUM(value2) OVER (ORDER BY value1 ) < (Select sum(value2) from table1)/4 Then 1 
   WHEN SUM(value2) OVER (ORDER BY value1 ) < 2 * (Select sum(value2) from 
table1)/4 Then 2 
    WHEN SUM(value2) OVER (ORDER BY value1 ) < 3 * (Select sum(value2) from 
table1)/4 Then 3 
    Else 4 
 End as Quartile
FROM Table1

1 个答案:

答案 0 :(得分:0)

我希望我能正确地做到这一点......

以下是一般方法。您可以使用变量指定@TileCount

DECLARE @Table1 TABLE(ID INT,value1 INT,value2 INT);
INSERT INTO @Table1 VALUES      
 (1,2,132)        
,(2,6,182)        
,(3,5,195)        
,(4,8,152)        
,(5,3,132)        
,(6,9,129)        
,(7,3,180)        
,(8,9,120)        
,(9,3,172)        
,(10,6,192)        
,(11,9,177)        
,(12,12,151);

DECLARE @TileCount INT=4;

WITH Sums AS
(
    SELECT TOP (@TileCount) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS TileRank
              ,A.SumTotal
              ,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) * (A.SumTotal / CAST(@TileCount AS FLOAT)) AS SumPart  
    FROM master..spt_values
    CROSS APPLY(SELECT (SELECT SUM(value2) FROM @Table1) AS SumTotal)AS A
)
,AddCumSum AS
(
    SELECT value1
          ,value2
          ,SUM(value2) OVER (ORDER BY value1) CumSum
     FROM @Table1
)
SELECT AddCumSum.*
      ,A.SumPart
      ,A.TileRank AS Tile
FROM AddCumSum
OUTER APPLY(SELECT TOP 1 * FROM Sums WHERE CumSum<=SumPart ORDER BY TileRank ASC) AS A;

结果

+--------+--------+--------+---------+------+
| value1 | value2 | CumSum | SumPart | Tile |
+--------+--------+--------+---------+------+
| 2      | 132    | 132    | 478,5   | 1    |
+--------+--------+--------+---------+------+
| 3      | 132    | 616    | 957     | 2    |
+--------+--------+--------+---------+------+
| 3      | 180    | 616    | 957     | 2    |
+--------+--------+--------+---------+------+
| 3      | 172    | 616    | 957     | 2    |
+--------+--------+--------+---------+------+
| 5      | 195    | 811    | 957     | 2    |
+--------+--------+--------+---------+------+
| 6      | 182    | 1185   | 1435,5  | 3    |
+--------+--------+--------+---------+------+
| 6      | 192    | 1185   | 1435,5  | 3    |
+--------+--------+--------+---------+------+
| 8      | 152    | 1337   | 1435,5  | 3    |
+--------+--------+--------+---------+------+
| 9      | 120    | 1763   | 1914    | 4    |
+--------+--------+--------+---------+------+
| 9      | 129    | 1763   | 1914    | 4    |
+--------+--------+--------+---------+------+
| 9      | 177    | 1763   | 1914    | 4    |
+--------+--------+--------+---------+------+
| 12     | 151    | 1914   | 1914    | 4    |
+--------+--------+--------+---------+------+

一些解释

CTE Sums计算一些允许将它们用作命名变量的值。在@TileCount条款中使用TOP与从ROW_NUMBER()中选择的master..spt_values相关联。这只不过是一张装满桌子的桌子。我们对这些值不感兴趣,我们只需要它作为获得运行数的基础。

第二个CTE AddCumSum返回运行摘要的结果。

最终SELECT找到适合运行摘要的最小TileRank。