SQL Server中按列的频率分布

时间:2017-01-18 20:27:55

标签: sql sql-server

我有一个包含以下结构的表

  year    age   score    weight
  ------------------------------
  2008    16    100      3
  2008    25    150      2
  2009    40    210      2
  2009    22    50       3
  2009    65    90       3

我需要按年龄找到分数的频率分布。

预期输出是按年龄分的分数:%: (不准确输出 - 仅用于描述。行总数为1)

                   0-50  51-100  101-150  151-200  201-250
 2008   16-35      0.3    0.2      0.1     0.4      0
 2008   36-40 
 2008   41-65
 2008   66+
 2008   All
 2009   16-35
 2009   36-40
 2009   41-65
 2009   66+

我想出了以下查询对一个小组的说法

select 
    yearofsale,
    sum(sum(case 
               when age between 16 and 35 AND score between 0 and 50 
                  then 1 
               else 0 
            end) * weight) / sum(case 
                                    when age between 16 and 35 and score between 0 and 50 
                                       then weight 
                                    else 1 end)
from 
    table
group by 
    yearofsale

但我很确定有一种更简单的方法可以做到这一点。有什么想法吗?

谢谢, 蜂

2 个答案:

答案 0 :(得分:1)

您可以先执行一个子查询,将年龄组和得分组信息添加到每条记录中。

然后你可以增加每年的总重量和年龄组。

之后,您将计算每个年龄和得分组的百分比。最后,您将结果转移到交叉表中:

select yearofsale, age_group, [0-50], [51-100], [101-150], [151-200], [201-250]
from (
    select yearofsale, age_group, score_group, 100.0*sum(weight) / min(total_weight) as pct
    from (
            select *, 
                   sum(weight) over (partition by yearofsale, age_group) as total_weight
            from (
                    select *,
                        case when age >= 66 then '66+'
                             when age >= 41 then '41-65'
                             when age >= 36 then '36-40'
                             when age >= 16 then '16-35' 
                        end as age_group,
                        case when score <  51 then '0-50'
                             when score < 101 then '51-100' 
                             when score < 151 then '101-150' 
                             when score < 201 then '151-200' 
                             when score < 251 then '201-250'
                        end as score_group
                    from
                        table) as base
            ) as base2
    group by yearofsale, age_group, score_group) as base3
pivot ( sum(pct)
    for score_group in ([0-50], [51-100], [101-150], [151-200], [201-250])
) as pivotTable

原始问题中样本数据的输出是:

 yearofsale | age_group |  0-50  | 51-100 | 101-150 | 151-200 | 201-250
------------+-----------+--------+--------+---------+---------+---------
    2008    |   16-35   |  NULL  |  60.00 |   40.00 |   NULL  |   NULL
    2009    |   16-35   | 100.00 |  NULL  |   NULL  |   NULL  |   NULL
    2009    |   36-40   |  NULL  |  NULL  |   NULL  |   NULL  |  100.00
    2009    |   41-65   |  NULL  | 100.00 |   NULL  |   NULL  |   NULL

答案 1 :(得分:0)

这是一种动态方法,您将获得 FULL Matrix

矩阵层存储在表格中,通过调整参数,您可以更改轴甚至源。

请考虑以下事项:

Declare @Source varchar(150)= 'YourTable'
Declare @KeyCol varchar(150)= 'Year'
Declare @YTier  varchar(50) = 'Age'
Declare @YMeas  varchar(50) = 'Age'
Declare @XTier  varchar(50) = 'Score'
Declare @XMeas  varchar(50) = 'Score'


Declare @SQL varchar(max) = '
;with cte1 as (
      Select '+@KeyCol+'
            ,YSeq   = max(Y.Seq)
            ,YTitle = max(Y.Title)
            ,XSeq   = max(X.Seq)
            ,XTitle = max(X.Title)
            ,Value  = sum(Weight) 
       From  '+@Source+' A
       Join  Tier Y on (Y.Tier='''+@YTier+''' and A.'+@YMeas+'   between Y.R1 and Y.R2)
       Join  Tier X on (X.Tier='''+@XTier+''' and A.'+@XMeas+' between X.R1 and X.R2)
       Group By '+@KeyCol+',Y.Seq,X.Seq
       Union All
       Select '+@KeyCol+'
             ,YSeq   = Y.Seq
             ,YTitle = Y.Title
             ,XSeq   = X.Seq
             ,XTitle = X.Title
             ,Value  = 0
       From  (Select Distinct '+@KeyCol+' from '+@Source+') A
       Cross Join (Select Distinct Seq,Title From Tier where Tier='''+@YTier+''') Y
       Cross Join (Select Distinct Seq,Title From Tier where Tier='''+@XTier+''') X )
, cte2 as (Select '+@KeyCol+',YSeq,RowTotal=sum(Value) from cte1 Group By '+@KeyCol+',YSeq)
, cte3 as (Select A.*
                 ,PctRow = Format(case when B.RowTotal=0 then 0 else (A.Value*100.0)/B.RowTotal end,''#0.0'')
           From   cte1 A
           Join   cte2 B on A.'+@KeyCol+'=B.'+@KeyCol+' and A.YSeq=B.YSeq )
 Select * 
  Into #Temp 
  From cte3

Declare @SQL2 varchar(max) = Stuff((Select '','' + QuoteName(Title) From Tier where Tier='''+@XTier+''' Order by Seq For XML Path('''')),1,1,'''') 
Select  @SQL2 = ''
Select ['+@KeyCol+'],[YTitle] as '+@YTier+','' + @SQL2 + ''
 From  (Select '+@KeyCol+',YSeq,YTitle,XTitle,PctRow=max(PctRow) from #Temp Group BY '+@KeyCol+',YSeq,YTitle,XTitle) A
 Pivot (max(PctRow) For [XTitle] in ('' + @SQL2 + '') ) p''
Exec(@SQL2);
'
Exec(@SQL)

返回

enter image description here

这些层存储在一般结构中。这允许多个版本。层表如下所示:

enter image description here