我正在研究html解析器,它使用Python多处理池,因为它运行了大量的页面。每页的输出都保存到单独的CSV文件中。问题是有时我得到意外的错误,整个程序崩溃,我几乎无处不在处理错误 - 阅读页面,解析页面,甚至编写文件。此外,它看起来像脚本在完成一批文件的写入后崩溃,所以它不应该是什么可以粉碎。因此,经过一整天的调试,我一言不发。
错误:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "D:\Programy\Python36-32\lib\multiprocessing\pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "D:\Programy\Python36-32\lib\multiprocessing\pool.py", line 44, in mapstar
return list(map(*args))
File "D:\ppp\Python\parser\run.py", line 244, in media_process
save_media_product(DIRECTORY, category, media_data)
File "D:\ppp\Python\parser\manage_output.py", line 180, in save_media_product
_file_manager(target_file, temp, temp2)
File "D:\ppp\Python\store_parser\manage_output.py", line 214, in _file_manager
file_to_write.close()
UnboundLocalError: local variable 'file_to_write' referenced before assignment
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "D:\ppp\Python\store_parser\run.py", line 356, in <module>
main()
File "D:\Rzeczy Mariusza\Python\store_parser\run.py", line 318, in main
process.map(media_process, batch)
File "D:\Programy\Python36-32\lib\multiprocessing\pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "D:\Programy\Python36-32\lib\multiprocessing\pool.py", line 644, in get
raise self._value
UnboundLocalError: local variable 'file_to_write' referenced before assignment
看起来,变量赋值存在错误,但它不是:
try:
file_to_write = open(target_file, 'w')
except OSError:
message = 'OSError while writing file name - {}'.format(target_file)
log_error(message)
except UnboundLocalError:
message = 'UnboundLocalError while writing file name - {}'.format(target_file)
log_error(message)
except Exception as e:
message = 'Total failure "{}" while writing file name - {}'.format(e, target_file)
log_error(message)
else:
file_to_write.write(temp)
file_to_write.write(temp2)
finally:
file_to_write.close()
行 - except Exception as e:
,没有任何帮助,整个事情仍然崩溃。到目前为止,我只排除了内存不足的情况,因为这个脚本被设计为在低规格的VPS上处理,但在测试阶段我在8 GB的ram环境中运行它。所以,如果你有任何理论,请分享。