我有一个值列表,其中每个值表示由特定工作人员完成的工作,我想表明这种情况变化很大,并且系列中有一些异常值。有些人工作很多,有些工作很少。但是,由于列表太大而且上限和下限都很大,因此实际上并没有显示数据中的峰值。
y=[12, 27, 1, 1, 2, 100, 67, 1, 17, 73, 20, 1, 5, 1, 192, 2, 4, 1, 2, 63, 1, 6, 62, 19, 1, 1, 9, 1, 380, 1, 1, 5, 101, 1, 39, 55, 42, 15, 10, 188, 16, 28, 1, 29, 15, 7, 8, 2, 13, 2336, 141, 1637, 10, 3, 6, 1, 62, 225, 1454, 1, 1, 1, 4, 8, 40, 2, 31, 1, 1, 474, 3, 15, 2, 8, 2, 1, 1, 259, 29, 5, 1, 2, 16, 5, 1060, 29, 5, 11, 2, 2428, 7, 31, 1476, 86, 5, 7, 22, 80, 18, 2, 6, 1, 9, 51, 1, 1, 1, 2, 21, 2918, 7, 17, 2, 99, 15, 3, 39, 1, 1, 20, 42, 7, 12, 1, 44, 3, 2, 5, 2, 1, 10, 262, 504, 60, 9, 2, 1, 2, 6, 3, 120, 48, 4, 6, 7, 3, 1, 174, 16, 2, 5, 1, 1, 1273, 66, 136, 1, 159, 1, 8, 3740, 161, 201, 4, 3, 4138, 29, 773, 1, 1, 1, 28, 2, 1, 6, 45, 32, 4, 1, 5, 6, 2006, 1, 9, 44, 91, 2, 20, 15, 1, 15, 1, 1, 9, 29, 38, 9, 422, 1, 4, 11475, 1, 5, 502, 10, 1, 16, 1, 1, 26, 2, 1, 3073, 3, 128, 56, 188, 159, 10, 6, 279, 148, 47, 66, 1702, 2, 6, 12, 15, 35, 2, 37, 3, 49, 8, 45, 3, 7, 29, 9, 2, 28, 1, 73, 2, 146, 1942, 20, 6, 1, 2, 936, 9, 5, 636, 6, 1, 4, 11, 3, 5, 15, 4, 2, 39, 42, 3, 1, 3, 3, 1, 60, 4, 4, 3, 3, 1, 1, 5, 6, 1, 42, 91, 93, 6, 89, 8, 149, 1, 1, 8, 2, 101, 83, 1778, 1, 8, 166, 19, 18, 103, 26, 2, 1206, 2, 6, 22, 5, 4, 2, 2, 4, 3, 128, 1, 6, 25, 409, 15, 1, 1, 76, 3, 116, 9, 3, 1, 215, 5, 30, 2, 1, 3, 105, 1, 4, 136, 638, 1842, 1, 3, 1, 1, 190, 12353, 12, 8, 7, 47, 9, 1, 132, 8, 3, 1, 1, 1871, 2, 6, 15, 4, 449, 14, 126, 29, 2, 1, 1, 1, 30, 2, 100, 2, 2, 44, 1, 3, 5, 1, 3, 3, 1, 133, 14, 57, 3089, 55, 142, 1, 89, 17, 1, 11, 562, 1, 1, 11, 75, 5, 13, 26, 2, 4, 1, 65, 124, 1, 3, 4, 5, 1, 32, 6, 15, 33, 2487, 28, 1, 36, 2, 3, 1926, 1, 1, 30, 3, 62, 23, 2, 5, 1098, 1, 5, 3, 1, 1, 1, 971, 1, 195, 32, 7, 9, 54, 127, 16, 227, 1, 1, 1, 110, 13, 2, 223, 80, 1, 638, 18, 1, 20, 1, 86, 33, 78, 1, 3, 3, 7, 1, 193, 259, 1, 20, 30, 2778, 2, 89, 174, 51, 507, 6, 9, 3, 22, 31, 148, 1098, 9, 76, 86, 366, 1, 61, 1, 116, 30, 147, 1, 12, 11, 4, 15, 1001, 22, 61, 86, 45, 344, 1, 1, 1, 18, 2, 9, 3, 69, 4, 7, 533, 3, 12, 123, 70, 1, 9, 3, 3, 27, 2, 1, 1]
我做了一个简单的直方图,但实际上并没有显示数据的峰值。什么是表明这一点的最佳情节。
import matplotlib
matplotlib.use('PS')
import matplotlib.pyplot as plt
plt.hist(y,alpha=0.5,label="work done per worker",color="blue")
答案 0 :(得分:2)
使用对数箱是一个不错的选择。
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots(1,2)
ax[0].hist(y)
ax[0].set_title('Original')
ax[1].hist(y, bins=np.logspace(0,4,20))
ax[1].set_xscale('log')
ax[1].set_title('Log bins')
fig.show()
答案 1 :(得分:1)
如果你想根据给定的顺序看到峰值,我建议使用常规情节 -
import matplotlib.pylot as plt
plt.plot(y)
或者如果您只是看点
plt.plot(y, marker='.', linestyle='')
如果你想看到与平均值的偏差,你应该使用方框图:
plt.boxplot(y)
或者首先将数据转换为日志以获得更好的可见性:
plt.boxplot(np.log(y))