使用pandas将具有字符串值的DataFrame块附加到大型HDF5文件的异常

时间:2015-08-18 16:14:50

标签: pandas hdf5 hdfstore

在filesize大于约47 GiB之后,将pandas.DataFrame()附加字符串值(数值正常)添加到HDF5存储时发生异常。字符串的最小大小,记录数,列数都不重要。文件大小很重要。

异常追踪的底部:

  File "..\..\hdf5-1.8.14\src\H5FDsec2.c", line 822, in H5FD_sec2_write
file write failed: time = Tue Aug 18 18:26:17 2015
, filename = 'large_file.h5', file descriptor = 4, errno = 22, error message = 'Invalid argument', buf = 0000000066A40018, total write size = 262095, bytes this sub-write = 262095, bytes actually written = 18446744073709551615, offset = 47615949533

要重现的代码:

import numpy as np
import pandas as pd

for i in range(200):
    df = pd.DataFrame(np.char.mod('random string object (%f)', np.random.rand(5000000,3)), columns=('A','B','C'))
    print('writing chunk №', i, '...', end='', flush=True)
    with pd.HDFStore('large_file.h5') as hdf:
        # Construct unique index
        try:
            nrows = hdf.get_storer('df').nrows
        except:
            nrows = 0
        df.index = pd.Series(df.index) + nrows    

        # Append the dataframe to the storage. Exception hppens here
        hdf.append('df', df, format='table')
    print('done')

环境: Windows7 x64机器,python 3.4.3,pandas 0.16.2,pytables 3.2.0,HDF5 1.8.14。

问题是如何解决问题,如果它位于上面的python代码中,或者如果与HDF5相关,如何避免它。感谢。

0 个答案:

没有答案