Redshift中的截断平均值

时间:2015-09-03 13:35:11

标签: postgresql amazon-redshift

如何计算红移数据库中的Truncated Mean?我希望它在非常大的数据集上运行

2 个答案:

答案 0 :(得分:0)

Redshift包含常用的SQL统计函数,包括您需要的NTILE

SELECT AVG(CASE WHEN quartile IN (2,3) THEN my_metric ELSE NULL END) central_mean
      ,AVG(my_metric) mean
FROM (SELECT my_metric, NTILE(4) OVER (ORDER BY cpu_usage) quartile
        FROM (SELECT * FROM my_table LIMIT 1000 ) t ) t
;

答案 1 :(得分:0)

您可以获取要从集合中删除的百分位数的阈值。然后过滤掉超出这些阈值边界的度量值,最后计算平均值。

SELECT avg(your_metric)
FROM (
   SELECT
     your_metric,
     PERCENTILE_DISC(0.1) -- 10% lower boundary 
     WITHIN GROUP (ORDER BY your_metric) OVER () AS lower_threshold,
     PERCENTILE_DISC(0.9) -- 90% higher boundary
     WITHIN GROUP (ORDER BY your_metric) OVER () AS higher_threshold
   FROM your_table

 ) t1 WHERE your_metric > lower_threshold AND your_metric < higher_threshold