通过winsorizing

时间:2016-02-22 14:46:25

标签: sql ms-access group-by outliers

我需要在每个日期找到变量的第99和第1百分位数。到目前为止,我已设法这样做,但在整个时期,我想"循环"对于每个日期(基本的winsorizing),以下查询(确实有效),如简单的GROUP BY,但后者不适用于TOP PERCENT)

SELECT Date,ID,Value, 
IIf(Value>[upper_threshold],[upper_threshold],IIf(Value<[lower_threshold],  
[lower_threshold],Value)) AS winsor_Value 
FROM MyTable,
(SELECT [lower_threshold], [upper_threshold] FROM (SELECT MAX(Value) AS 
lower_threshold FROM (SELECT TOP 1 PERCENT Value FROM MyTable ORDER BY   
Value))  AS t1, (SELECT MIN(Value) AS upper_threshold FROM (SELECT TOP 1 
PERCENT Value FROM MyTable ORDER BY Value DESC)));

我的数据看起来像

enter image description here

我有70万行。

非常感谢

1 个答案:

答案 0 :(得分:1)

我不确定以下是否在MS Access中有效,但值得一试。要获得前99%的值:

select t.date,
       (select min(t2.value)
        from (select top 1 percent t2.*
              from t as t2
              where t2.date = t.date
              order by t2.value desc
             ) as t2
       ) as percentile_99
from (select distinct date
      from t
     ) as t;

我不知道MS Access范围规则是否允许您将子查询关联到多个深度级别。如果是这样,上述方法应适用于所有百分位数。