我使用map_asyn分享我的工作量,但是我发现我得到的是MemoryError
,但我找不到解决方案或工作周围。这是我得到的错误:
Exception in thread Thread-3:
Traceback (most recent call last):
File "C:\Python27\lib\threading.py", line 810, in __bootstrap_inner
self.run()
File "C:\Python27\lib\threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "C:\Python27\lib\multiprocessing\pool.py", line 380, in _handle_results
task = get()
MemoryError
以下是代码:
pool = Pool(maxtasksperchild=2)
fixed_args = (targetdirectorytxt, value_dict)
varg = ((filename,) + fixed_args for filename in readinfiles)
op_list = pool.map_async(PPD_star, list(varg), chunksize=1)
while not op_list.ready():
print("Number of files left to process: {}".format(op_list._number_left))
time.sleep(600)
op_list = op_list.get()
pool.close()
pool.join()
以下是我的尝试:
是否有建议避免此错误?
我正在阅读文件:
with open(os.path.join(txtdatapath,pathfilename), "r") as data:
datalines = (line.rstrip('\r\n') for line in data)
for record in datalines:
答案 0 :(得分:0)
我同意@AndréLaszio,文件可能太大而无法保存在内存中。改变你的逻辑只能在内存中保留一行内存应该可以减轻内存压力,除非每行都很大。
以下是打开文件并一次使用一行的替代方法。将文件内容作为数组保存在内存中是一项昂贵的操作。
readingfiles.py
:
from memory_profiler import profile
@profile
def open_file_read_all_then_parse(filename):
"""
Open a file and remove all the new lines characters from each line then
parse the resulting array of clean lines.
"""
with open(filename, "r") as data:
datalines = (line.rstrip('\r\n') for line in data)
for record in datalines:
pass
@profile
def open_file_read_and_parse(filename):
"""
Open a file and iterate over each line of the file while striping the record
of any newline characters.
"""
with open(filename, "r") as data:
for record in data:
record.rstrip('\r\n')
if __name__ == '__main__':
# input.dat is a roughly 4m file with 10000 lines
open_file_read_all_then_parse("./input.dat")
open_file_read_and_parse("./input.dat")
我使用了一个额外的模块来帮助追踪我的内存使用情况,该模块名为memory profiler。这个模块帮助我验证了我的内存问题来自哪里,可能对你的调试有用。它将列出程序的内存使用情况和内存使用的区域。
对于更深入的性能分析,我建议Huy Nguyen this article。