我试图根据一系列可以是任意长度的数字来计算四分位数范围,例如。
1, 1, 5, 6, 7, 8, 2, 4, 7, 9, 9, 9, 9
我需要从这个四分位数范围内得到的值是:
如果我将上述数字数组转储到Microsoft Excel(A:M列),那么我可以使用以下公式:
=QUARTILE.INC(A1:M1,1)
=QUARTILE.INC(A1:M1,2)
=QUARTILE.INC(A1:M1,3)
得到我的答案:
我现在需要在SQL Server或VB.NET中计算出这3个值。我可以使用这些语言中的任何一种格式或对象获取数组值,但是我找不到任何存在的函数,例如Excel具有的QUARTILE.INC
函数。
有谁知道如何在SQL Server或VB.NET中实现这一目标?
答案 0 :(得分:4)
可能有一种更简单的方法,但要获得Quartiles,您可以使用NTILE (Transact-SQL)
将有序分区中的行分配到指定数量的组中。这些组从一开始编号。对于每一行,NTILE返回该行所属的组的编号。
所以对于你的数据:
SELECT 1 Val
INTO #temp
UNION ALL
SELECT 1
UNION ALL
SELECT 5
UNION ALL
SELECT 6
UNION ALL
SELECT 7
UNION ALL
SELECT 8
UNION ALL
SELECT 2
UNION ALL
SELECT 4
UNION ALL
SELECT 7
UNION ALL
SELECT 9
UNION ALL
SELECT 9
UNION ALL
SELECT 9
UNION ALL
SELECT 9
-- NTILE(4) specifies you require 4 partitions (quartiles)
SELECT NTILE(4) OVER ( ORDER BY Val ) AS Quartile ,
Val
INTO #tempQuartiles
FROM #temp
SELECT *
FROM #tempQuartiles
DROP TABLE #temp
DROP TABLE #tempQuartiles
这会产生:
Quartile Val
1 1
1 1
1 2
1 4
2 5
2 6
2 7
3 7
3 8
3 9
4 9
4 9
4 9
通过这个你可以找出你想要的东西。
所以修改SELECT
你可以这样做:
SELECT Quartile, MAX(Val) MaxVal
FROM #tempQuartiles
WHERE Quartile <= 3
GROUP BY Quartile
生产:
Quartile MaxVal
1 4
2 7
3 9
答案 1 :(得分:3)
我们创建了一个User-Defined-Type,将其用作函数参数,然后以这种方式使用它。
我们的实现使用与Excel Percentile函数相同的计算。
CREATE TYPE [dbo].[floatListType] AS TABLE (
[value] FLOAT NOT NULL
);
GO
CREATE FUNCTION [dbo].[getPercentile]
(
@data floatListType readonly,
@percentile float
)
RETURNS float
AS
BEGIN
declare @values table
(
value float,
idx int
);
insert into @values
select value, ROW_NUMBER() OVER (order by value) - 1 as idx
from @data;
declare @cnt int = (select count(*) from @values)
, @n float = (@cnt - 1) * @percentile + 1
, @k int = FLOOR(@n)
, @d float = @n - @k;
if (@k = 0)
return (select value from @values where idx = 0)
if (@k = @cnt)
return (select value from @values where idx = @cnt - 1)
if (@k > 0 AND @k < @cnt)
return (select value from @values where idx = @k - 1)
+ @d * ((select value from @values where idx = @k)
- (select value from @values where idx = @k - 1))
return null;
END
您可以像这样使用它来获得中位数和四分位数(因为Q1仅为0.25百分位数),例如:
declare @values floatListType;
insert into @values
select value from #mytable
select getPercentile(@values, 0.25) as Q1,
getPercentile(@values, 0.5) as median,
getPercentile(@values, 0.75) as Q3
答案 2 :(得分:1)
如果您想要一个SQL Server解决方案,几年前I posted an Interquartile Range procedure on my blog。它基于动态SQL,因此您可以将有权访问的任何列插入其中。它还没有经过充分测试,我当时还在学习绳索,现在代码有点旧了,但它可以满足您的开箱即用需求,或至少提供代码的起点你自己的解决方案以下是代码的要点 - 请点击我的博客链接进行深入讨论。
CREATE PROCEDURE [Calculations].[InterquartileRangeSP]
@DatabaseName as nvarchar(128) = NULL, @SchemaName as nvarchar(128), @TableName as nvarchar(128),@ColumnName AS nvarchar(128), @PrimaryKeyName as nvarchar(400), @OrderByCode as tinyint = 1, @DecimalPrecision AS nvarchar(50)
AS
SET @DatabaseName = @DatabaseName + ‘.’
DECLARE @SchemaAndTableName nvarchar(400)
SET @SchemaAndTableName = ISNull(@DatabaseName, ”) + @SchemaName + ‘.’ + @TableName
DECLARE @SQLString nvarchar(max)
SET @SQLString = ‘DECLARE @OrderByCode tinyint,
@Count bigint,
@LowerPoint bigint,
@UpperPoint bigint,
@LowerRemainder decimal(38,37), — use the maximum precision and scale for these two variables to make the
procedure flexible enough to handle large datasets; I suppose I could use a float
@UpperRemainder decimal(38,37),
@LowerQuartile decimal(‘ + @DecimalPrecision + ‘),
@UpperQuartile decimal(‘ + @DecimalPrecision + ‘),
@InterquartileRange decimal(‘ + @DecimalPrecision + ‘),
@LowerInnerFence decimal(‘ + @DecimalPrecision + ‘),
@UpperInnerFence decimal(‘ + @DecimalPrecision + ‘),
@LowerOuterFence decimal(‘ + @DecimalPrecision + ‘),
@UpperOuterFence decimal(‘ + @DecimalPrecision + ‘)
SET @OrderByCode = ‘ + CAST(@OrderByCode AS nvarchar(50)) + ‘ SELECT @Count=Count(‘ + @ColumnName + ‘)
FROM ‘ + @SchemaAndTableName +
‘ WHERE ‘ + @ColumnName + ‘ IS NOT NULL
SELECT @LowerPoint = (@Count + 1) / 4, @LowerRemainder = ((CAST(@Count AS decimal(‘ + @DecimalPrecision + ‘)) + 1) % 4) /4,
@UpperPoint = ((@Count + 1) *3) / 4, @UpperRemainder = (((CAST(@Count AS decimal(‘ + @DecimalPrecision + ‘)) + 1) *3) % 4) / 4; –multiply by 3 for the left s’ + @PrimaryKeyName + ‘e on the upper point to get 75 percent
WITH TempCTE
(‘ + @PrimaryKeyName + ‘, RN, ‘ + @ColumnName + ‘)
AS (SELECT ‘ + @PrimaryKeyName + ‘, ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY ‘ + @ColumnName + ‘ ASC) AS RN, ‘ + @ColumnName + ‘
FROM ‘ + @SchemaAndTableName + ‘
WHERE ‘ + @ColumnName + ‘ IS NOT NULL),
TempCTE2 (QuartileValue)
AS (SELECT TOP 1 ‘ + @ColumnName + ‘ + ((Lead(‘ + @ColumnName + ‘, 1) OVER (ORDER BY ‘ + @ColumnName + ‘) – ‘ + @ColumnName + ‘) * @LowerRemainder) AS QuartileValue
FROM TempCTE
WHERE RN BETWEEN @LowerPoint AND @LowerPoint + 1
UNION
SELECT TOP 1 ‘ + @ColumnName + ‘ + ((Lead(‘ + @ColumnName + ‘, 1) OVER (ORDER BY ‘ + @ColumnName + ‘) – ‘ + @ColumnName + ‘) * @UpperRemainder) AS QuartileValue
FROM TempCTE
WHERE RN BETWEEN @UpperPoint AND @UpperPoint + 1)
SELECT @LowerQuartile = (SELECT TOP 1 QuartileValue
FROM TempCTE2 ORDER BY QuartileValue ASC), @UpperQuartile = (SELECT TOP 1 QuartileValue
FROM TempCTE2 ORDER BY QuartileValue DESC)
SELECT @InterquartileRange = @UpperQuartile – @LowerQuartile
SELECT @LowerInnerFence = @LowerQuartile – (1.5 * @InterquartileRange), @UpperInnerFence = @UpperQuartile + (1.5 * @InterquartileRange), @LowerOuterFence = @LowerQuartile – (3 * @InterquartileRange), @UpperOuterFence = @UpperQuartile + (3 * @InterquartileRange)
–SELECT @LowerPoint AS LowerPoint, @LowerRemainder AS LowerRemainder, @UpperPoint AS UpperPoint, @UpperRemainder AS UpperRemainder
— uncomment this line to debug the inner calculations
SELECT @LowerQuartile AS LowerQuartile, @UpperQuartile AS UpperQuartile, @InterquartileRange AS InterQuartileRange,@LowerInnerFence AS LowerInnerFence, @UpperInnerFence AS UpperInnerFence,@LowerOuterFence AS LowerOuterFence, @UpperOuterFence AS UpperOuterFence
SELECT ‘ + @PrimaryKeyName + ‘, ‘ + @ColumnName + ‘, OutlierDegree
FROM (SELECT ‘ + @PrimaryKeyName + ‘, ‘ + @ColumnName + ‘,
”OutlierDegree” = CASE WHEN (‘ + @ColumnName + ‘ < @LowerInnerFence AND ‘ + @ColumnName + ‘ >= @LowerOuterFence) OR (‘ +
@ColumnName + ‘ > @UpperInnerFence
AND ‘ + @ColumnName + ‘ <= @UpperOuterFence) THEN 1
WHEN ‘ + @ColumnName + ‘ < @LowerOuterFence OR ‘ + @ColumnName + ‘ > @UpperOuterFence THEN 2
ELSE 0 END
FROM ‘ + @SchemaAndTableName + ‘
WHERE ‘ + @ColumnName + ‘ IS NOT NULL) AS T1
ORDER BY CASE WHEN @OrderByCode = 1 THEN ‘ + @PrimaryKeyName + ‘ END ASC,
CASE WHEN @OrderByCode = 2 THEN ‘ + @PrimaryKeyName + ‘ END DESC,
CASE WHEN @OrderByCode = 3 THEN ‘ + @ColumnName + ‘ END ASC,
CASE WHEN @OrderByCode = 4 THEN ‘ + @ColumnName + ‘ END DESC,
CASE WHEN @OrderByCode = 5 THEN OutlierDegree END ASC,
CASE WHEN @OrderByCode = 6 THEN OutlierDegree END DESC‘
–SELECT @SQLString — uncomment this to debug string errors
EXEC (@SQLString)