Python多处理内存错误

时间:2015-02-19 11:46:06

标签: python multiprocessing

我使用map_asyn分享我的工作量,但是我发现我得到的是MemoryError,但我找不到解决方案或工作周围。这是我得到的错误:

Exception in thread Thread-3:
Traceback (most recent call last):
  File "C:\Python27\lib\threading.py", line 810, in __bootstrap_inner
    self.run()
  File "C:\Python27\lib\threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "C:\Python27\lib\multiprocessing\pool.py", line 380, in _handle_results
    task = get()
MemoryError

以下是代码:

pool = Pool(maxtasksperchild=2)
fixed_args = (targetdirectorytxt, value_dict)
varg = ((filename,) + fixed_args for filename in readinfiles)
op_list = pool.map_async(PPD_star, list(varg), chunksize=1)
while not op_list.ready():
        print("Number of files left to process: {}".format(op_list._number_left))
        time.sleep(600)
op_list = op_list.get()
pool.close()
pool.join()

以下是我的尝试:

  • 减少线程数
  • 限制maxtasksperchild
  • appply_sync而不是map_sync

是否有建议避免此错误?

我正在阅读文件:

with open(os.path.join(txtdatapath,pathfilename), "r") as data:
        datalines = (line.rstrip('\r\n') for line in data)
        for record in datalines:

1 个答案:

答案 0 :(得分:0)

我同意@AndréLaszio,文件可能太大而无法保存在内存中。改变你的逻辑只能在内存中保留一行内存应该可以减轻内存压力,除非每行都很大。

以下是打开文件并一次使用一行的替代方法。将文件内容作为数组保存在内存中是一项昂贵的操作。

readingfiles.py

from memory_profiler import profile


@profile
def open_file_read_all_then_parse(filename):
    """
    Open a file and remove all the new lines characters from each line then
    parse the resulting array of clean lines.
    """
    with open(filename, "r") as data:
        datalines = (line.rstrip('\r\n') for line in data)
        for record in datalines:
            pass


@profile
def open_file_read_and_parse(filename):
    """
    Open a file and iterate over each line of the file while striping the record
    of any newline characters.
    """
    with open(filename, "r") as data:
        for record in data:
            record.rstrip('\r\n')


if __name__ == '__main__':
    # input.dat is a roughly 4m file with 10000 lines
    open_file_read_all_then_parse("./input.dat")
    open_file_read_and_parse("./input.dat")

我使用了一个额外的模块来帮助追踪我的内存使用情况,该模块名为memory profiler。这个模块帮助我验证了我的内存问题来自哪里,可能对你的调试有用。它将列出程序的内存使用情况和内存使用的区域。

对于更深入的性能分析,我建议Huy Nguyen this article