Question

当数据框通过pytables保存到HDF5时，HDF5上的pandas multiindex的结构是什么？每个部分是一个单独的索引还是有一个连接索引？

Answer 1

它几乎完全像df.reset_index()一样存储，除了您自动将索引列作为数据列（意味着您可以选择它们）。

In [1]: df = DataFrame({'A' : np.random.randn(9)},index=pd.MultiIndex.from_product([range(3),list('abc')],names=['first','second']))

In [2]: df
Out[2]: 
                     A
first second          
0     a      -1.249058
      b      -0.674645
      c      -0.000458
1     a       0.455390
      b      -1.693221
      c       1.245806
2     a       0.337478
      b       0.672525
      c       0.160914

In [3]: store = pd.HDFStore('test.h5',mode='w')

In [4]: store.append('df',df)

In [5]: store           
Out[5]: 
<class 'pandas.io.pytables.HDFStore'>
File path: test.h5
/df            frame_table  (typ->appendable_multi,nrows->9,ncols->3,indexers->[index],dc->[second,first])

这是实际结构的样子。

In [7]: store.get_storer('df').table
Out[7]: 
/df/table (Table(9,)) ''
  description := {
  "index": Int64Col(shape=(), dflt=0, pos=0),
  "values_block_0": Float64Col(shape=(1,), dflt=0.0, pos=1),
  "second": StringCol(itemsize=1, shape=(), dflt='', pos=2),
  "first": Int64Col(shape=(), dflt=0, pos=3)}
  byteorder := 'little'
  chunkshape := (2621,)
  autoindex := True
  colindexes := {
    "index": Index(6, medium, shuffle, zlib(1)).is_csi=False,
    "second": Index(6, medium, shuffle, zlib(1)).is_csi=False,
    "first": Index(6, medium, shuffle, zlib(1)).is_csi=False}

按名称选择级别

In [9]: store.select('df',where='second="b"')
Out[9]: 
                     A
first second          
0     b      -0.674645
1     b      -1.693221
2     b       0.672525

In [10]: store.select('df',where='second="b" & first=2')
Out[10]: 
                     A
first second          
2     b       0.672525

Pandas multiindex和pytables ...单独的索引或一个连接索引？

1 个答案: