这是你的HDF5 和多处理大师...首先,我知道python h5py和多处理模块不一定喜欢彼此,但我收到的错误是我只是想不通。我正在处理的脚本创建一个临时的内存 hdf5文件,将来自(pickle)输入文件的数据存储在内存文件中,然后执行多处理池(只读)对临时HDF5文件的数据进行操作。
我已经能够隔离导致错误的代码,所以这里是一个简化的代码片段。当我在生成器函数中创建内存 hdf5文件,然后使用该生成器为多处理池生成参数时,我得到一系列HDF5错误。这里有一些代码,就像我可以重新创建错误一样简单:
import h5py
from multiprocessing import Pool
from itertools import imap
useMP = True
def doNothing(arg):
print "Do nothing with %s"%arg
def myGenerator():
print "Create hdf5 in-memory file..."
hdfFile = h5py.File('test.hdf',driver='core',backing_store=False)
print "Finished creating hdf5 in-memory file."
yield 3.14159
'''
# uncommenting this section will result in yet another HDF5 error.
print "Closing hdf5 in-memory file..."
hdfFile.close()
print "Finished closing hdf5 in-memory file."
'''
if useMP:
pool = Pool(1)
mapFunc = pool.imap
else:
mapFunc = imap
data = [d for d in mapFunc(doNothing,myGenerator())]
当我使用itertools.imap(设置“useMP = False”)时,我得到以下输出,如预期的那样:
Create hdf5 in-memory file...
Finished creating hdf5 in-memory file.
Do nothing with 0
但是当我使用Pool.imap时,即使只用一个工作线程创建了池,我也得到了这个输出:
Create hdf5 in-memory file...
HDF5-DIAG: Error detected in HDF5 (1.8.9) thread 139951680009984:
#000: ../../../src/H5F.c line 1538 in H5Fopen(): unable to open file
major: File accessability
minor: Unable to open file
#001: ../../../src/H5F.c line 1227 in H5F_open(): unable to open file: time = Wed Feb 27 09:32:32 2013
, name = 'test.hdf', tent_flags = 1
major: File accessability
minor: Unable to open file
#002: ../../../src/H5FD.c line 1101 in H5FD_open(): open failed
major: Virtual File Layer
minor: Unable to initialize object
#003: ../../../src/H5FDcore.c line 464 in H5FD_core_open(): unable to open file
major: File accessability
minor: Unable to open file
Finished creating hdf5 in-memory file.
Do nothing with 0
奇怪的是,这个错误不会导致程序崩溃。我正在写的导致此错误的脚本实际上正如我所期望的那样 - 但是它为它创建的每个内存文件提供了上述错误。使用itertools.imap时没有错误,读取现有HDF5文件时没有错误,只有多处理和内存HDF5文件的组合。
h5py版本2.1.1
hdf5版本1.8.9
Python版本2.7.3
答案 0 :(得分:2)
在挖掘了一些h5py文件之后,我发现了一个很简单的答案,但仍然不完整。 h5py.File类在h5py/_hl/files.py中定义。在调用make_fid()时创建File对象期间发生错误:
def make_fid(name, mode, userblock_size, fapl):
""" Get a new FileID by opening or creating a file.
Also validates mode argument."""
...
elif mode == 'a' or mode is None:
try:
fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
try:
existing_fcpl = fid.get_create_plist()
if userblock_size is not None and existing_fcpl.get_userblock() != userblock_size:
raise ValueError("Requested userblock size (%d) does not match that of existing file (%d)" % (userblock_size, existing_fcpl.get_userblock()))
except:
fid.close()
raise
except IOError:
fid = h5f.create(name, h5f.ACC_EXCL, fapl=fapl, fcpl=fcpl)
如果文件存在,则以读/写模式打开。如果它不存在(内存文件的情况),则h5f.open()调用会引发异常。这是对h5f.open的调用,它触发了H5Fopen()错误消息。
仍然存在的问题是为什么只在使用多处理时打印错误?首先,我假设主线程正在调用创建HDF5文件的生成器函数。嗯,事实并非如此。多处理池实际上创建了一个新线程来处理imap和imap_unordered的任务,但不是map / map_async。将pool.imap替换为pool.map,并从主线程调用生成器函数,并且不会打印任何错误消息。那么为什么在单独的线程中创建HDF5文件会引发错误呢?
--------更新-----------
显然,h5py会自动消除主线程中的HDF5错误消息,因为h5py会处理错误。但是,它还没有自动消除子线程中的错误。解决方案:h5py._errors.silence_errors()
。此功能禁用当前线程中的自动HDF5错误打印。请参阅h5py issue 206。此代码使HDF5错误无效:
import h5py
from multiprocessing import Pool
from itertools import imap
import threading
useMP = True
def doNothing(arg):
print "Do nothing with %s"%arg
def myGenerator():
h5py._errors.silence_errors()
print "Create hdf5 in-memory file..."
hdfFile = h5py.File('test.hdf',driver='core',backing_store=False)
print "Finished creating hdf5 in-memory file."
yield 3.14159
if useMP:
pool = Pool(1)
mapFunc = pool.map
else:
mapFunc = imap
data = [d for d in mapFunc(doNothing,myGenerator())]