使用包含表格式和数据列的pytables创建HDF5文件

时间:2015-11-02 08:17:08

标签: python pandas hdf5 pytables

我想阅读之前使用PyTables创建的h5文件。

使用Pandas读取文件,并使用某些条件,如下所示:

pd.read_hdf('myH5file.h5', 'anyTable', where='some_conditions')

从另一个问题,我被告知,为了使h5文件具有read_hdf's where参数“可查询”,必须在table format中写入,此外,某些列必须是声明为data columns

我在PyTables文档中找不到任何相关内容。

关于PyTable的create_table方法的文档没有说明任何内容。

所以,现在,如果我尝试在使用PyTables创建的h5文件中使用类似的东西,我会得到以下内容:

>>> d = pd.read_hdf('test_file.h5','basic_data', where='operation==1')
C:\Python27\lib\site-packages\pandas\io\pytables.py:3070: IncompatibilityWarning: 
where criteria is being ignored as this version [0.0.0] is too old (or
not-defined), read the file in and write it out to a new file to upgrade (with
the copy_to method)

  warnings.warn(ws, IncompatibilityWarning)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 323, in read_hdf
    return f(store, True)
  File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 305, in <lambda>
    key, auto_close=auto_close, **kwargs)
  File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 665, in select
    return it.get_result()
  File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 1359, in get_result
    results = self.func(self.start, self.stop, where)
  File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 658, in func
    columns=columns, **kwargs)
  File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 3968, in read
    if not self.read_axes(where=where, **kwargs):
  File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 3196, in read_axes
    values = self.selection.select()
  File "C:\Python27\lib\site-packages\pandas\io\pytables.py", line 4482, in select
    start=self.start, stop=self.stop)
  File "C:\Python27\lib\site-packages\tables\table.py", line 1567, in read_where
    self._where(condition, condvars, start, stop, step)]
  File "C:\Python27\lib\site-packages\tables\table.py", line 1528, in _where
    compiled = self._compile_condition(condition, condvars)
  File "C:\Python27\lib\site-packages\tables\table.py", line 1366, in _compile_condition
    compiled = compile_condition(condition, typemap, indexedcols)
  File "C:\Python27\lib\site-packages\tables\conditions.py", line 430, in compile_condition
    raise _unsupported_operation_error(nie)
NotImplementedError: unsupported operand types for *eq*: int, bytes

修改

回溯提到了IncompatibilityWarning和版本[0.0.0],但是如果我检查我的Pandas和Tables版本,我会得到:

>>> import pandas
>>> pandas.__version__
'0.15.2'
>>> import tables
>>> tables.__version__
'3.1.1'

所以,我完全糊涂了。

1 个答案:

答案 0 :(得分:0)

我遇到了同样的问题,这就是我所做的。

  1. 通过PyTables创建HDF5文件;
  2. 通过pandas.read_hdf读取此HDF5文件并使用“where = where_string,columns = selected_columns”等参数

  3. 我收到如下警告消息和其他错误消息:

      

    d:\程序   文件\ Anaconda3 \ LIB \站点包\大熊猫\ IO \ pytables.py:3065:   不兼容性警告:标准被忽略的地方就是这样   版本[0.0.0]太旧(或未定义),在和中读取文件   将其写入要升级的新文件(使用copy_to方法)

         

    warnings.warn(ws,IncompatibilityWarning)

  4. 我尝试过这样的命令:

      

    hdf5_store = pd.HDFStore(hdf5_file,mode ='r')

         

    h5cpt_store_new = hdf5_store.copy(hdf5_new_file,complevel = 9,complib ='blosc')   h5cpt_store_new.close()

  5. 完全按照步骤2运行命令,它可以正常工作。

    大熊猫。的版本 '0.17.1'

    表。的版本 '3.2.2'