Question

我想在特定范围内创建值的箱线图。数据来自文本文件，如下所示：

range;int
3;200
3;200
3;200
3;200
3;200
3;200
3;100
3;200
3;200
5;400
5;400
5;400
5;400
5;400
5;400
5;300
5;400
5;400
5;400
5;400
5;400
5;400
5;400
5;300
5;400

第一行是范围，第二行是值。如您所知，第一行包含重复项。请继续阅读：

 data = np.genfromtxt('out.txt', delimiter=';', names=True, dtype= int)

如果我尝试在

中使用此数据

fig, ax = plt.subplots()
ax.boxplot(data['range'], patch_artist=True)
plt.show()

它只为所有'int'制作一个boxplot。如何获取数据或调整skript以获得每个独特范围的箱线图？

Answer 1

要重新排序数据，您可以使用Python的内置排序函数：

data = np.genfromtxt('out.txt', delimiter=';', names=True, dtype=int)
data_sorted = sorted(data, key=lambda value: (value[0], value[1]))

在上述dh81评论后编辑：

如果要查找每个范围内的已排序数据，可以找到不同的范围值并使用已排序的数组创建字典。以下是我提出的建议：

import numpy as np

# Get and sort the data
data = np.genfromtxt('out.txt', delimiter=';', names=True, dtype=int)
data_sorted = sorted(data, key=lambda value: (value[0], value[1]))

# Prepare dictionary to hold different arrays
data_dict = {}

# Find the different ranges needed
range_keys = set([i[0] for i in data])

# Populate each range with the values
for range_key in range_keys:
    range_values = []
    for data_point in data_sorted:
        if data_point[0] == range_key:
            range_values.append(data_point)
    data_dict.update({range_key: range_values})

print("Got the dictionary of arrays: {}".format(data_dict))

度假村python列表的箱形图

1 个答案: