我试图在Python中绘制直方图,但内存不足,因为我正在处理的文本文件中有太多数字。我的文本文件的格式如下所示:
size - 读 - 写
大小是一个数字 读取和写入是由空格分隔的许多数字。
例如:10 - 1 23 245 2567 - 2 32 342 1231
如果重要的话,读取和写入也已经排序。
文本文件仅包含10行用于绘图的数据,但每行都很长。整个文本文件是几千兆字节。
我一次制作两张图,一张用于读取,另一张用于写入。每个图表都显示为http://matplotlib.org/1.3.0/_images/histogram_demo_extended_06.png 但是每个箱子有10个吧,而不是3个吧。
问题是我的内存不足。有没有办法在文本文件中逐个接收单行的元素,而不是占用整行?我已经在stackoverflow上搜索了类似的东西,但它们与我所遇到的问题有点不同,其中一行包含大量数据,并且我想要一个每个bin中有多个条形图的直方图。我对python不是很熟悉,因为这是我用它做的第一件事。非常感谢
这是我的代码,它耗尽了大文本文件的内存..
#!/usr/bin/python
# -*- coding: utf-8 -*-
import matplotlib
matplotlib.use("Agg")
import re # regular expressions
import matplotlib.pyplot as plt # plotting
import numpy as np
datafile = "x264.data"
benchmark_name = "x264"
# Set up a dict/map for the data
stuff = []
with open(datafile,'r') as ifile:
for line in ifile:
# Remove whitespace
line = line.strip()
if not line:
# Skip empty lines
continue
if line[0] == "$":
print "something is wrong"
if line[0] == "&":
print "some missing alloc routine not accounted for"
if line[0] not in "0123456789":
continue
# Line starts with a number, therefore data
# Split data using the delimeter
column_data = re.split("-", line)
# Remove any leftover whitespace
column_data = [string_data.strip() for string_data in column_data]
# Get the Address as an integer
maxSize = str(column_data[0])
reads = column_data[1].strip()
reads = [ int(s) for s in re.split('\s+', reads) ]
writes = column_data[2].strip()
writes = [ int(s) for s in re.split('\s+', writes) ]
stuff += [[maxSize, reads, writes]]
###############################################################################
# PLOTTING
# See http://matplotlib.org/examples/api/barchart_demo.html
# for info on making barcharts
# Size plotting data
common_params = dict(bins=20, label=[stuff[0][0], stuff[1][0], stuff[2][0], stuff[3][0], stuff[4][0], stuff[5][0], stuff[6][0], stuff[7][0], stuff[8][0], stuff[9][0]])
reads = plt.figure()
ax1 = reads.add_subplot(111)
ax1.set_title(benchmark_name + "reads")
ax1.set_xlabel("instruction #")
ax1.set_ylabel("# of accesses")
read_data = [s[1] for s in stuff[0:10]]
write_data = [s[2] for s in stuff[0:10]]
ax1.hist(read_data, **common_params)
ax1.set_yscale('log',nonposy='clip')
lgd=ax1.legend(ncol = 1, title = "maxSize", loc='center left', bbox_to_anchor=(1, 0.5))
#ax1.plt.savefig(benchmark_name + " reads.png")
reads.savefig(benchmark_name + " reads.png",bbox_extra_artists=(lgd,), bbox_inches='tight')
writes = plt.figure()
ax2 = writes.add_subplot(111)
ax2.set_title(benchmark_name + "writes")
ax2.set_xlabel("instruction #")
ax2.set_ylabel("# of accesses")
ax2.hist(write_data, **common_params)
ax2.set_yscale('log',nonposy='clip')
lgd=ax2.legend(ncol = 1, title = "maxSize", loc='center left', bbox_to_anchor=(1, 0.5))
writes.savefig(benchmark_name + " writes.png",bbox_extra_artists=(lgd,), bbox_inches='tight')
plt.show()