Question

我试图在Python中绘制直方图，但内存不足，因为我正在处理的文本文件中有太多数字。我的文本文件的格式如下所示：

size - 读 - 写

大小是一个数字读取和写入是由空格分隔的许多数字。

例如：10 - 1 23 245 2567 - 2 32 342 1231

如果重要的话，

读取和写入也已经排序。

文本文件仅包含10行用于绘图的数据，但每行都很长。整个文本文件是几千兆字节。

我一次制作两张图，一张用于读取，另一张用于写入。每个图表都显示为http://matplotlib.org/1.3.0/_images/histogram_demo_extended_06.png 但是每个箱子有10个吧，而不是3个吧。

问题是我的内存不足。有没有办法在文本文件中逐个接收单行的元素，而不是占用整行？我已经在stackoverflow上搜索了类似的东西，但它们与我所遇到的问题有点不同，其中一行包含大量数据，并且我想要一个每个bin中有多个条形图的直方图。我对python不是很熟悉，因为这是我用它做的第一件事。非常感谢

这是我的代码，它耗尽了大文本文件的内存..

#!/usr/bin/python
# -*- coding: utf-8 -*-
import matplotlib
matplotlib.use("Agg")
import re                       # regular expressions
import matplotlib.pyplot as plt # plotting

import numpy as np

datafile = "x264.data"
benchmark_name = "x264"
# Set up a dict/map for the data
stuff = []

with open(datafile,'r') as ifile:
    for line in ifile:

        # Remove whitespace
        line = line.strip()

        if not line:
            # Skip empty lines
            continue

        if line[0] == "$":
            print "something is wrong"

        if line[0] == "&":
            print "some missing alloc routine not accounted for"

        if line[0] not in "0123456789":
            continue

        # Line starts with a number, therefore data

        # Split data using the delimeter
        column_data = re.split("-", line)
        # Remove any leftover whitespace
        column_data = [string_data.strip() for string_data in column_data]

        # Get the Address as an integer
        maxSize = str(column_data[0])

        reads = column_data[1].strip()
        reads = [ int(s) for s in re.split('\s+', reads) ]

        writes = column_data[2].strip()
        writes = [ int(s) for s in re.split('\s+', writes) ]

        stuff += [[maxSize, reads, writes]]


###############################################################################
# PLOTTING

# See http://matplotlib.org/examples/api/barchart_demo.html
# for info on making barcharts

# Size plotting data


common_params = dict(bins=20, label=[stuff[0][0], stuff[1][0], stuff[2][0], stuff[3][0], stuff[4][0], stuff[5][0], stuff[6][0], stuff[7][0], stuff[8][0], stuff[9][0]])

reads = plt.figure()
ax1 = reads.add_subplot(111)

ax1.set_title(benchmark_name + "reads")
ax1.set_xlabel("instruction #")
ax1.set_ylabel("# of accesses")


read_data = [s[1] for s in stuff[0:10]]
write_data = [s[2] for s in stuff[0:10]]

ax1.hist(read_data, **common_params)
ax1.set_yscale('log',nonposy='clip')

lgd=ax1.legend(ncol = 1, title = "maxSize", loc='center left', bbox_to_anchor=(1, 0.5))

#ax1.plt.savefig(benchmark_name + " reads.png")
reads.savefig(benchmark_name + " reads.png",bbox_extra_artists=(lgd,), bbox_inches='tight')
writes = plt.figure()
ax2 = writes.add_subplot(111)

ax2.set_title(benchmark_name + "writes")
ax2.set_xlabel("instruction #")
ax2.set_ylabel("# of accesses")

ax2.hist(write_data, **common_params)
ax2.set_yscale('log',nonposy='clip')


lgd=ax2.legend(ncol = 1, title = "maxSize", loc='center left', bbox_to_anchor=(1, 0.5))

writes.savefig(benchmark_name + " writes.png",bbox_extra_artists=(lgd,), bbox_inches='tight')

plt.show()

在Python直方图中绘制大量数据

0 个答案: