我正在创建一个计算修剪均值的函数。为此,我删除了最高和最低百分比的数据,然后像往常一样计算平均值。到目前为止我所拥有的是:
def trimmed_mean(data, percent):
from numpy import percentile
if percent < 50:
data_trimmed = [i for i in data
if i > percentile(data, percent)
and i < percentile(data, 100-percent)]
else:
data_trimmed = [i for i in data
if i < percentile(data, percent)
and i > percentile(data, 100-percent)]
return sum(data_trimmed) / float(len(data_trimmed))
但我确实得到了错误的结果。因此,对于[37, 33, 33, 32, 29, 28, 28, 23, 22, 22, 22, 21, 21, 21, 20, 20, 19, 19, 18, 18, 18, 18, 16, 15, 14, 14, 14, 12, 12, 9, 6]
,当我得到20.16
时,10%的平均值应为20.0
。
有没有其他方法可以删除python中的顶部和底部数据? 或者还有什么我做错了吗?
答案 0 :(得分:6)
您可以查看以下相关问题:Trimmed Mean with Percentage Limit in Python?
简称scipy版本&gt; 0.14.0以下工作
from scipy import stats
m = stats.trim_mean(X, percentage)
如果您不希望依赖外部库,那么您当然可以恢复到Chip Grandits答案中所示的方法。
答案 1 :(得分:2)
我建议先对数组进行排序,然后再将“切片放在中间”。
#some "fancy" numpy sort or even just plain old sorted()
#sorted_data = sorted(data) #uncomment to use plain python sorted
n = len(sorted_data)
outliers = n*percent/100 #may want some rounding logic if n is small
trimmed_data = sorted_data[outliers: n-outliers]
答案 2 :(得分:1)
这里:
import numpy as np
def trimmed_mean(data, percent):
data = np.array(sorted(data))
trim = int(percent*data.size/100.0)
return data[trim:-trim].mean()
答案 3 :(得分:1)
也许这会起作用:
data = [37, 33, 33, 32, 29, 28, 28, 23, 22, 22, 22, 21, 21, 21, 20, 20, 19, 19, 18, 18, 18, 18, 16, 15, 14, 14, 14, 12, 12, 9, 6]
percent = .1 # == 10%
def trimmed_mean(data, percent):
# sort list
data = sorted(data)
# number of elements to remove from both ends of list
g = int(percent * len(data))
# remove elements
data = data[g:-g]
# cast sum to float to avoid implicit casting to int
return float(sum(data)) / len(data)
print trimmed_mean(data, percent)
输出:
$ python trimmed_mean.py
20.16