如何计算红移数据库中的Truncated Mean?我希望它在非常大的数据集上运行
答案 0 :(得分:0)
Redshift包含常用的SQL统计函数,包括您需要的NTILE
。
SELECT AVG(CASE WHEN quartile IN (2,3) THEN my_metric ELSE NULL END) central_mean
,AVG(my_metric) mean
FROM (SELECT my_metric, NTILE(4) OVER (ORDER BY cpu_usage) quartile
FROM (SELECT * FROM my_table LIMIT 1000 ) t ) t
;
答案 1 :(得分:0)
您可以获取要从集合中删除的百分位数的阈值。然后过滤掉超出这些阈值边界的度量值,最后计算平均值。
SELECT avg(your_metric)
FROM (
SELECT
your_metric,
PERCENTILE_DISC(0.1) -- 10% lower boundary
WITHIN GROUP (ORDER BY your_metric) OVER () AS lower_threshold,
PERCENTILE_DISC(0.9) -- 90% higher boundary
WITHIN GROUP (ORDER BY your_metric) OVER () AS higher_threshold
FROM your_table
) t1 WHERE your_metric > lower_threshold AND your_metric < higher_threshold