我有一个hdf5数据库,这给我带来了一些麻烦。
它应包含3,000个表,其中包含6列(整数和浮点数)以及索引(日期)和可变行数(从100到10,000,000)。
从昨天开始,当我使用ViTables查看数据库时,我会错过数千个表。我用它来在ViTables中看到它们。但数据仍然存在:我仍然可以通过Pandas访问它们。
数据分组如下:type/source/id
例如,我可以使用:
检索id1和id2 with pd.get_store(HDF_DATABASE) as store:
print store['type1/source1/id1']
print store['type2/source2/id2']
但是从ViTables,我看不到type2/source2/id2
。
此外,> print store
会列出type1/source1/id1
,但不列出type2/source2/id2
。
关于如何修复这些“隐形”数据表的任何建议?
编辑:
EDIT2:
NaturalNameWarning
警告:store.append('equity/bloomberg/4615238QCN_Equity', df)
。它不符合产生警告的自然命名要求。这可能与遇到的问题有关。答案 0 :(得分:0)
示例会话
In [1]: store = pd.HDFStore('test.h5')
In [2]: store['node()'] = Series(np.arange(10))
/usr/local/lib/python2.7/site-packages/tables/path.py:99: NaturalNameWarning: object name is not a valid Python identifier: 'node()'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
NaturalNameWarning)
In [3]: store
Out[3]:
<class 'pandas.io.pytables.HDFStore'>
File path: test.h5
/df frame_table (typ->appendable,nrows->11,ncols->2,indexers->[index],dc->[A,B])
/node() series (shape->[10])
In [4]: store.keys()
Out[4]: ['/df', '/node()']
In [5]: store['node()/foo'] = Series(np.arange(10))
In [6]: store.keys()
Out[6]: ['/df', '/node()', '/node()/foo']
In [7]: store
Out[7]:
<class 'pandas.io.pytables.HDFStore'>
File path: test.h5
/df frame_table (typ->appendable,nrows->11,ncols->2,indexers->[index],dc->[A,B])
/node() series (shape->[10])
/node()/foo series (shape->[10])
In [8]: store['my_type\mysource\id_01_01'] = Series(np.arange(10))
/usr/local/lib/python2.7/site-packages/tables/path.py:99: NaturalNameWarning: object name is not a valid Python identifier: 'my_type\\mysource\\id_01_01'; it does not match the pattern ``^[a-zA-Z_][a-zA-Z0-9_]*$``; you will not be able to use natural naming to access this object; using ``getattr()`` will still work, though
NaturalNameWarning)
In [9]: store
Out[9]:
<class 'pandas.io.pytables.HDFStore'>
File path: test.h5
/df frame_table (typ->appendable,nrows->11,ncols->2,indexers->[index],dc->[A,B])
/my_type\mysource\id_01_01 series (shape->[10])
/node() series (shape->[10])
/node()/foo series (shape->[10])
In [10]: store.keys()
Out[10]: ['/df', '/my_type\\mysource\\id_01_01', '/node()', '/node()/foo']
In [11]: store['my_type/mysource/id_01_01'] = Series(np.arange(10))
In [12]: store
Out[12]:
<class 'pandas.io.pytables.HDFStore'>
File path: test.h5
/df frame_table (typ->appendable,nrows->11,ncols->2,indexers->[index],dc->[A,B])
/my_type\mysource\id_01_01 series (shape->[10])
/node() series (shape->[10])
/node()/foo series (shape->[10])
/my_type/mysource/id_01_01 series (shape->[10])
问题是标识符'my_type \ mysource \ id_01_01`没有按照你的想法行事,它看起来像文件路径。您需要反斜杠,而不是正斜杠(因为它们取决于架构)。理论上虽然这可行(但为了避免警告,你可能想要改变这些名称)。