Question

我有以下文件data.txt

此文件包含边界框的数量及其各自的高度。我编写了一个函数，分别从json输入data.txt中提取所有盒子的高度：

高度[43、17、23、24、17、27、19、19、24、22、8、8、26、25、18、19， 20、20、20、21、20、20、22、18、18、19、19、16、13、20、20、19、19， 20，13，20，18，18，13，12，19，25，17，13，38，38，20，19，16]

我编写了以下脚本来绘制每个框的高度

box_number=[]
box_height=[]

for index2, num2 in enumerate(heights):
    print('box number',index2, 'box height',num2)
    box_number.append(index2)
    box_height.append(num2)


#ax = sns.lineplot(box_number, box_height); 
ax = sns.stripplot(box_number, box_height); 
ax.set(xlabel ='box number', ylabel ='height of box') 

# giving title to the plot 
plt.title('My first graph'); 

# function to show plot 
plt.show()

这是输出：

我想编写一个函数来打印高度很高且与height平均值不同的盒子。简要打印框号为0.44和45。我该怎么做？

（每次我都会得到一组不同的盒子，但我必须找到它们的高度平均值和打印框太高）

Answer 1

有几种发现异常值的策略。离群值的定义是最重要的。如果您要按照您的描述进行简单的计算，则可以执行以下操作：

import numpy as np

# heights
hs = [43, 17, 23, 24, 17, 27, 19, 19, 24, 22, 8, 8, 26, 25, 18, 19, 20, 20, 20, 21, 20,
      20, 22, 18, 18, 19, 19, 16, 13, 20, 20, 19, 19, 20, 13, 20, 18, 18, 13, 12, 19, 
      25, 17, 13, 38, 38, 20, 19, 16]

# let's say that an outlier is a height that is farther than 2*std from the mean
outliers_definition = np.abs(hs - np.mean(hs)) > 2 * np.std(hs)

# you can get their indexes this way
outliers_idx = np.argwhere(outliers_definition)

print(outliers_idx)
# array([[ 0],
#        [44],
#        [45]], dtype=int64)

请注意，此处的mean也考虑了离群值。例如，您可以使用median。如果您想要更强大的功能，则有大量有关异常值检测的文献。我建议您看看它。

从图形中的平均值返回偏差值

1 个答案: