如何删除顶部和底部n%的数据

时间:2016-03-07 15:00:16

标签: python numpy

我正在创建一个计算修剪均值的函数。为此,我删除了最高和最低百分比的数据,然后像往常一样计算平均值。到目前为止我所拥有的是:

def trimmed_mean(data, percent):
    from numpy import percentile

    if percent < 50:
        data_trimmed = [i for i in data
                        if i > percentile(data, percent)
                        and i < percentile(data, 100-percent)]
    else:
        data_trimmed = [i for i in data
                        if i < percentile(data, percent)
                        and i > percentile(data, 100-percent)]

    return sum(data_trimmed) / float(len(data_trimmed))

但我确实得到了错误的结果。因此,对于[37, 33, 33, 32, 29, 28, 28, 23, 22, 22, 22, 21, 21, 21, 20, 20, 19, 19, 18, 18, 18, 18, 16, 15, 14, 14, 14, 12, 12, 9, 6],当我得到20.16时,10%的平均值应为20.0

有没有其他方法可以删除python中的顶部和底部数据? 或者还有什么我做错了吗?

4 个答案:

答案 0 :(得分:6)

您可以查看以下相关问题:Trimmed Mean with Percentage Limit in Python?

简称scipy版本&gt; 0.14.0以下工作

from scipy import stats
m = stats.trim_mean(X, percentage)

如果您不希望依赖外部库,那么您当然可以恢复到Chip Grandits答案中所示的方法。

答案 1 :(得分:2)

我建议先对数组进行排序,然后再将“切片放在中间”。

#some "fancy" numpy sort or even just plain old sorted()
#sorted_data = sorted(data) #uncomment to use plain python sorted 
n = len(sorted_data)
outliers = n*percent/100 #may want some rounding logic if n is small
trimmed_data = sorted_data[outliers: n-outliers]

答案 2 :(得分:1)

这里:

import numpy as np
def trimmed_mean(data, percent):
    data = np.array(sorted(data))
    trim = int(percent*data.size/100.0)
    return data[trim:-trim].mean()

答案 3 :(得分:1)

也许这会起作用:

data = [37, 33, 33, 32, 29, 28, 28, 23, 22, 22, 22, 21, 21, 21, 20, 20, 19, 19, 18, 18, 18, 18, 16, 15, 14, 14, 14, 12, 12, 9, 6]
percent = .1 # == 10%

def trimmed_mean(data, percent):
    # sort list
    data = sorted(data)
    # number of elements to remove from both ends of list
    g = int(percent * len(data))
    # remove elements
    data = data[g:-g]
    # cast sum to float to avoid implicit casting to int
    return float(sum(data)) / len(data)

print trimmed_mean(data, percent)

输出:

$ python trimmed_mean.py 
20.16