如何在单个选择查询中获得均值,中位数,模式和范围?

时间:2016-06-10 17:02:22

标签: sql sql-server sql-server-2012

我试图获取表格中一组值的均值,中位数,模式和范围。我能够得到平均值,但中位数,范围和模式我得错了。

以下是我为上述概念尝试的代码。

Select 
    CDS.[Commodity_SourceSeriesID_LongDesc] AS 'Description',
    TD.TimeDimension_Year AS 'Year',
    AVG(DV.DataValues_AttributeValue) AS 'Average/Mean',
    MAX(dv.DataValues_AttributeValue) AS 'Maximum value for the Year',
    MIN(dv.DataValues_AttributeValue) AS 'Minimum value for the Year',
    ((MAX(dv.DataValues_AttributeValue) + MIN(dv.DataValues_AttributeValue)) / 2) AS 'Median',
    --,(SELECT TOP 1 with ties DataValues_AttributeValue
    --FROM   [CoSD].[DataValues] 
    --WHERE  DataValues_AttributeValue IS Not NULL AND DataValues_ERSCommodity_ID = 157 and DataValues_DataRowLifecyclePhaseID = 1
    --GROUP  BY DataValues_AttributeValue
    --ORDER  BY COUNT(*) DESC) AS Mode
    (MAX(dv.DataValues_AttributeValue) - MIN(dv.DataValues_AttributeValue))  AS 'Range'
FROM 
    [CoSD].[DataValues] DV 
INNER JOIN 
    [CoSD].[CommodityDataSeries] CDS ON CDS.Commodity_ID = DV.DataValues_Commodity_ID
INNER JOIN 
    [CoSD].[TimeDimension_LU] TD ON TD.TimeDimension_ID = DV.DataValues_TimeDimension_ID
WHERE 
    DataValues_Commodity_ID = 157  
    AND DataValues_DataRowLifecyclePhaseID IN (1, 4)
GROUP BY 
    DV.DataValues_TimeDimension_ID,
    CDS.Commodity_SourceSeriesID_LongDesc,
    TD.TimeDimension_Year

有没有办法实现这个目标?

由于

3 个答案:

答案 0 :(得分:1)

我想你可能宁愿做这样的事情:

select dbo.Median(DataValues_AttributeValue)
from ...

没有灵巧的方式来获得中间或模式的方式类似于使用本机聚合,如avg,max,min,max等。但是,您可能想尝试.NET CLR聚合实现例如,如果你想要一些优雅的东西,可以在C#中实现中位数和模式像上面的代码片段一样。

这是我过去所做的。

答案 1 :(得分:1)

不确定这是否会有所帮助,但这里有一些sql允许我在一个组内生成一些统计数据(...,mean,median,mode,..)

  • cteBase将是您的核心数据(非聚合或分组)
  • cteMedian将生成cteBase的中位数
  • cteMode会计算cteBase的模式

我只计算了一项措施,但我怀疑它很容易扩展 如果我有“GrpByYear”,则必须将其扩展到复合字段中。

;with cteBase as (
     Select RowNr=Row_Number() over (Partition By Year(TR_Date) Order By Year(TR_Date),TR_Y10)
           ,GrpByYear = Year(TR_Date)
           ,Measure = TR_Y10
     From [Chinrus-Series].[dbo].[DS_Treasury_Rates]
     Where Year(TR_Date)>2014
    )
    ,cteMedian as (Select A.GrpByYear,Measure From cteBase A Join (Select GrpByYear,RowNr=Max(RowNr)/2 from cteBase Group by GrpByYear) B on (A.GrpByYear=B.GrpByYear and A.RowNr=B.RowNr))
    ,cteMode   as (Select * from (Select RowNr=Row_Number() over (Partition By GrpByYear Order by Count(*) Desc),GrpByYear,Measure,Hits=count(*) From cteBase Group by GrpByYear,Measure) A Where RowNr=1)
    Select A.GrpByYear
          ,RecordCount   = Count(*)
          ,DistinctCount = Count(Distinct A.Measure)
          ,SumTotal      = Sum(A.Measure)
          ,Minimum       = Min(A.Measure)
          ,Maximum       = Max(A.Measure)
          ,Mean          = Avg(A.Measure)
          ,Median        = Max(B.Measure)
          ,Mode          = Max(C.Measure)
          ,StdDev        = STDEV(A.Measure)
     From cteBase A
     Join cteMedian B on A.GrpByYear=B.GrpByYear
     Join cteMode   C on A.GrpByYear=C.GrpByYear
     Group By A.GrpByYear
     Order By A.GrpByYear


Year    RecordCount DistinctCount   SumTotal    Minimum Maximum Mean    Median  Mode    StdDev
2016    110         43              204.82      1.63    2.25    1.862   1.84    1.83    0.128568690811108
2015    251         69              536.71      1.68    2.50    2.1382  2.16    2.20    0.1662836533952

答案 2 :(得分:1)

在SQL 2012或更高版本中,使用percentile_cont函数计算中位数通常更容易。看起来问题的其余部分已经解决,但我认为你也想知道这个选项。

https://msdn.microsoft.com/en-us/library/hh231473.aspx