Question

我想使用matplotlib绘制直方图。但是，由于我发送到hist（）函数的巨大数据（包含大约100,000个数字的列表），绘制两个数字时会出现错误。但它只是在绘制两个图中的任何一个时顺利进行。谁能帮我解决这个问题？提前谢谢。

以下是显示错误的简化代码：

f_120 = plt.figure(1)
plt.hist(taccept_list, bins=6000000, normed = True, histtype ="step", cumulative = True, color = 'b', label = 'accepted answer')
plt.hist(tfirst_list, bins=6000000, normed = True, histtype ="step", cumulative = True, color = 'g',label = 'first answer')
plt.axvline(x = 30, ymin = 0, ymax = 1, color = 'r', linestyle = '--', label = '30 min')
plt.axvline(x = 60, ymin = 0, ymax = 1, color = 'c', linestyle = '--', label = '1 hour')
plt.legend()

plt.ylabel('Percentage of answered questions')
plt.xlabel('Minutes elapsed after questions are posted')
plt.title('Cumulative histogram: time elapsed \n before questions receive answer (first 2 hrs)')
plt.ylim(0,1)
plt.xlim(0,120)
f_120.show()
f_120.savefig('7month_0_120.png', format = 'png' )
plt.close()

f_2640 = plt.figure(2)
plt.hist(taccept_list, bins=6000000, normed = True, histtype ="step", cumulative = True, color = 'b', label = 'accepted answer')
plt.hist(tfirst_list, bins=6000000, normed = True, histtype ="step", cumulative = True, color = 'g',label = 'first answer')
plt.axvline(x = 240, ymin = 0, ymax = 1, color = 'r', linestyle = '--', label = '4 hours')
plt.axvline(x = 1440, ymin = 0, ymax = 1, color = 'c', linestyle = '--', label = '1 day')
plt.legend(loc= 4)

plt.ylabel('Percentage of answered questions')
plt.xlabel('Minutes elapsed after questions are posted')
plt.title('Cumulative histogram: time elapsed \n before questions receive answer (first 48)')
plt.ylim(0,1)
plt.xlim(0,2640)
f_2640.show()
f_2640.savefig('7month_0_2640.png', format = 'png' )

以下是错误详情：

plt.hist（tfirst_list，bins = 6000000，normed = True，histtype =“step”，cumulative = True，color ='g'，label ='first answer'）

文件“C：\ software \ Python26 \ lib \ site-packages \ matplotlib \ pyplot.py”，第2160行，在hist中 ret = ax.hist（x，bins，range，normed，weights，cumulative，bottom，histtype，align，orientation，rwidth，log，color，label，** kwargs）

文件“C：\ software \ Python26 \ lib \ site-packages \ matplotlib \ axes.py”，第7775行，在hist中 closed = False，edgecolor = c，fill = False））

文件“C：\ software \ Python26 \ lib \ site-packages \ matplotlib \ axes.py”，第6384行，填写 for self中的poly._get_patches_for_fill（* args，** kwargs）：

文件“C：\ software \ Python26 \ lib \ site-packages \ matplotlib \ axes.py”，第317行，在_grab_next_args中对于self._plot_args中的seg（剩下的，kwargs）：

文件“C：\ software \ Python26 \ lib \ site-packages \ matplotlib \ axes.py”，第304行，在_plot_args中 seg = func（x [：，j％ncx]，y [：，j％ncy]，kw，kwargs）

文件“C：\ software \ Python26 \ lib \ site-packages \ matplotlib \ axes.py”，第263行，在_makefill中（X [：，np.newaxis]，Y [：，np.newaxis]）），

文件“C：\ software \ Python26 \ lib \ site-packages \ numpy \ core \ shape_base.py”，第270行，在hstack中 return _nx.concatenate（map（atleast_1d，tup），1）

的MemoryError

Answer 1

正如其他人所说，600万个箱子听起来不是很有用。但是一个简单的事情就是重复使用相同的数字：因为改变的唯一情节元素是直方图以外的东西，尝试这样的事情：

vline1 = plt.axvline(...)
vline2 = plt.axvline(...)
lgd = legend()

并且在savefig之后不关闭图形并绘制新的直方图，而是重复使用它，改变需要更改的内容：

# change vline1 and vline2 positions and labels
vline1.set_data([240,240],[0,1])
vline1.set_label('new label')
vline2.set_data(...)
vline2.set_label(...)
# remove old legend, replace with new
lgd.remove()
lgd = plt.legend(loc=4)
plt.xlabel('new xlabel')
# etc

最后再使用新文件名调用savefig。

Answer 2

你绘制600万个箱子，然后放大（大概）一小部分。每个数字有两条线，这是1200万个数据点，一旦你尝试在下一个数字中再绘制1200万个数据点，我就不会感到惊讶了matplotlib崩溃。我非常怀疑你真的需要六百万个箱子，所以让我们试着把你的直方图缩小到一个更容易管理的尺寸！

假设您的数据跨越了您希望查看的44或48小时。然后有600万个箱子，这意味着你的数据分辨率为30毫秒。考虑到你显示的分钟分辨率，这似乎是不合理的。或者，你有一个秒的分辨率，所以600万个箱子意味着你的数据跨越70天，但你只看其中两个。

假设您对两天的数据感兴趣，其分辨率为秒或分钟。

当您将分档指定为多个分档时，您还可以指定一系列值。因此，对于您的第一张图，您可以说

plt.hist(taccept_list, bins=range(120), normed = True, histtype ="step", cumulative = True, color = 'b', label = 'accepted answer')
plt.hist(tfirst_list, bins=range(120), normed = True, histtype ="step", cumulative = True, color = 'g',label = 'first answer')

在前120分钟内以分钟为单位给出解决方案。直方图将忽略高于120的任何东西，这很好，因为你无论如何都不会在你的情节中显示它。

以秒为单位的解决方案可以是：

numpy.linspace(0,120,7200)

现在，直方图中的点数更合理，可能更符合您正在查看/显示的数据。

处理大量数据时出现内存错误

2 个答案: