Question

我有一个hdf5数据库，这给我带来了一些麻烦。

它应包含3,000个表，其中包含6列（整数和浮点数）以及索引（日期）和可变行数（从100到10,000,000）。

从昨天开始，当我使用ViTables查看数据库时，我会错过数千个表。我用它来在ViTables中看到它们。但数据仍然存在：我仍然可以通过Pandas访问它们。

数据分组如下：type/source/id

例如，我可以使用：

检索id1和id2

 with pd.get_store(HDF_DATABASE) as store:
     print store['type1/source1/id1']
     print store['type2/source2/id2']

但是从ViTables，我看不到type2/source2/id2。

此外，> print store会列出type1/source1/id1，但不列出type2/source2/id2。

关于如何修复这些“隐形”数据表的任何建议？

编辑：

Typos
Windows 7 32位/ Python 2.7.5 / Pandas 0.12.0（及其他过去的版本）
ptdump file：http://pastebin.com/7mB6bT2T
正如人们所料，我混淆了类型来源id
看起来数据不再被引用，但只要数据库不是ptrepack-ed，它仍然存在。

EDIT2：

我完全丢失了原始数据库：我再也无法访问它了。格式无法识别。
用于插入新数据的此语句（以及类似的其他语句）返回NaturalNameWarning警告：store.append('equity/bloomberg/4615238QCN_Equity', df)。它不符合产生警告的自然命名要求。这可能与遇到的问题有关。

Answer 1

示例会话

In [1]: store = pd.HDFStore('test.h5')

In [2]: store['node()'] = Series(np.arange(10))
/usr/local/lib/python2.7/site-packages/tables/path.py:99: NaturalNameWarning: object name is not a valid Python identifier: 'node()'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
  NaturalNameWarning)

In [3]: store
Out[3]: 
<class 'pandas.io.pytables.HDFStore'>
File path: test.h5
/df                frame_table  (typ->appendable,nrows->11,ncols->2,indexers->[index],dc->[A,B])
/node()            series       (shape->[10])                                                   

In [4]: store.keys()
Out[4]: ['/df', '/node()']

In [5]: store['node()/foo'] = Series(np.arange(10))

In [6]: store.keys()
Out[6]: ['/df', '/node()', '/node()/foo']

In [7]: store
Out[7]: 
<class 'pandas.io.pytables.HDFStore'>
File path: test.h5
/df                    frame_table  (typ->appendable,nrows->11,ncols->2,indexers->[index],dc->[A,B])
/node()                series       (shape->[10])                                                   
/node()/foo            series       (shape->[10])                                                   

In [8]: store['my_type\mysource\id_01_01'] = Series(np.arange(10))
/usr/local/lib/python2.7/site-packages/tables/path.py:99: NaturalNameWarning: object name is not a valid Python identifier: 'my_type\\mysource\\id_01_01'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
  NaturalNameWarning)

In [9]: store
Out[9]: 
<class 'pandas.io.pytables.HDFStore'>
File path: test.h5
/df                                   frame_table  (typ->appendable,nrows->11,ncols->2,indexers->[index],dc->[A,B])
/my_type\mysource\id_01_01            series       (shape->[10])                                                   
/node()                               series       (shape->[10])                                                   
/node()/foo                           series       (shape->[10])                                                   

In [10]: store.keys()
Out[10]: ['/df', '/my_type\\mysource\\id_01_01', '/node()', '/node()/foo']

In [11]: store['my_type/mysource/id_01_01'] = Series(np.arange(10))

In [12]: store
Out[12]: 
<class 'pandas.io.pytables.HDFStore'>
File path: test.h5
/df                                   frame_table  (typ->appendable,nrows->11,ncols->2,indexers->[index],dc->[A,B])
/my_type\mysource\id_01_01            series       (shape->[10])                                                   
/node()                               series       (shape->[10])                                                   
/node()/foo                           series       (shape->[10])                                                   
/my_type/mysource/id_01_01            series       (shape->[10])

问题是标识符'my_type \ mysource \ id_01_01`没有按照你的想法行事，它看起来像文件路径。您需要反斜杠，而不是正斜杠（因为它们取决于架构）。理论上虽然这可行（但为了避免警告，你可能想要改变这些名称）。

hdf5数据库中的不可见数据（使用Python / Pandas / ViTables）

1 个答案: