让我们说我想创建一个列表的箱线图,其中包含数字1-5,每个数字大约一百万次。
这样的清单大约是5 000 000,但是它表示为一个根本没有空间的字典:
s = {1: 1000000, 2: 1000000, 3: 1000000, 4: 1000000, 5:1000000}
问题是,如果我尝试创建该dict的boxplot,我会收到错误
Traceback (most recent call last):
File "<pyshell#17>", line 1, in <module>
ax.boxplot(s)
File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/matplotlib/axes.py", line 5462, in boxplot
if not hasattr(x[0], '__len__'):
KeyError: 0
是否有一种巧妙的方法来绘制字典s
,而不必将所有元素都放在列表中?
评论建议我尝试
boxplot(n for n, count in s.iteritems() for _ in xrange(count))
但这导致了
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
boxplot(n for n, count in s.iteritems() for _ in xrange(count))
File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/matplotlib/pyplot.py", line 2134, in boxplot
ret = ax.boxplot(x, notch, sym, vert, whis, positions, widths, patch_artist, bootstrap)
File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/matplotlib/axes.py", line 5462, in boxplot
if not hasattr(x[0], '__len__'):
TypeError: 'generator' object has no attribute '__getitem__'
答案 0 :(得分:4)
使用图片描述数据的全部意义在于对整个数据有一种感觉,而不是非常精确。所以 通过为每1000个实际数据点生成一个代表性数据点来缩小数据没有太大的危害:
x = [val for val, num in s.items() for i in range(num//1000)]
肉眼应该足够好了:
import matplotlib.pyplot as plt
import numpy as np
s = {1: 1000000, 2: 1000000, 3: 1000000, 4: 1000000, 5:1000000}
x = [val for val, num in s.items() for i in range(num//1000)]
dct = plt.boxplot(x)
plt.show()
答案 1 :(得分:2)
据我所知,matplotlib没有这种数据的方法。基本上,您必须计算相关统计数据并实施自己绘制箱图的方法。这可能会让你开始:
import matplotlib.pyplot as plt
import numpy as np
s = [{1: 1000000, 2: 1000000, 3: 1000000, 4: 1000000, 5:1000000},
{1: 1000000, 0: 1000000, 8: 1000000, 3: 1000000, 7:1000000}]
def boxplot(data, x=0):
sorted_data = np.array(data.items())
sorted_data = np.sort(sorted_data, 0)
values = sorted_data[:,0]
freqs = sorted_data[:,1]
freqs = np.cumsum(freqs)
freqs = freqs*1./np.max(freqs)
#get 25%, 50%, 75% percentiles
idx = np.searchsorted(freqs, [0.25, 0.5, 0.75])
p25, p50, p75 = values[idx]
vmin, vmax = values.min(), values.max()
ax = plt.gca()
l,r = -0.2+x, 0.2+x
#plot boxes
plt.plot([l,r], [p50, p50], 'k')
plt.plot([l, r, r, l, l], [p25, p25, p75, p75, p25], 'k')
plt.plot([x,x], [p75, vmax], 'k')
plt.plot([x,x], [p25, vmin], 'k')
for i in range(len(s)):
boxplot(s[i],i)
plt.xlim(-0.5,1.5)
plt.show()