在filesize大于约47 GiB之后,将pandas.DataFrame()
附加字符串值(数值正常)添加到HDF5存储时发生异常。字符串的最小大小,记录数,列数都不重要。文件大小很重要。
异常追踪的底部:
File "..\..\hdf5-1.8.14\src\H5FDsec2.c", line 822, in H5FD_sec2_write
file write failed: time = Tue Aug 18 18:26:17 2015
, filename = 'large_file.h5', file descriptor = 4, errno = 22, error message = 'Invalid argument', buf = 0000000066A40018, total write size = 262095, bytes this sub-write = 262095, bytes actually written = 18446744073709551615, offset = 47615949533
要重现的代码:
import numpy as np
import pandas as pd
for i in range(200):
df = pd.DataFrame(np.char.mod('random string object (%f)', np.random.rand(5000000,3)), columns=('A','B','C'))
print('writing chunk №', i, '...', end='', flush=True)
with pd.HDFStore('large_file.h5') as hdf:
# Construct unique index
try:
nrows = hdf.get_storer('df').nrows
except:
nrows = 0
df.index = pd.Series(df.index) + nrows
# Append the dataframe to the storage. Exception hppens here
hdf.append('df', df, format='table')
print('done')
环境: Windows7 x64机器,python 3.4.3,pandas 0.16.2,pytables 3.2.0,HDF5 1.8.14。
问题是如何解决问题,如果它位于上面的python代码中,或者如果与HDF5相关,如何避免它。感谢。