我认为应该可以存储带有MultiIndexed轴的DataFrame。但是,我收到以下错误:
In [1]: index = pd.MultiIndex.from_product([['Foo', 'Bar'],['One','Two','Three']])
column = pd.MultiIndex.from_product([['foo', 'bar'],['one','two','three']])
df = pd.DataFrame(np.random.rand(6,6), index=index, columns=column)
df
Out[1]: foo bar
one two three one two three
Foo One 0.605352 0.882382 0.472946 0.615619 0.108022 0.389674
Two 0.746384 0.594509 0.556881 0.457000 0.529793 0.929574
Three 0.270978 0.956778 0.515201 0.626850 0.852708 0.861962
Bar One 0.219994 0.648191 0.677824 0.408439 0.079326 0.414059
Two 0.186167 0.767103 0.880667 0.205253 0.647471 0.449379
Three 0.353171 0.249900 0.723791 0.458349 0.977604 0.691188
In [2]: with pd.HDFStore('test.h5', 'w') as store:
store.append('output', df)
Out[2]: ---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-55-87e96c141a7f> in <module>()
1 with pd.HDFStore('test.h5', 'w') as store:
----> 2 store.append('output', df)
/home/kartik/miniconda3/lib/python3.5/site-packages/pandas/io/pytables.py in append(self, key, value, format, append, columns, dropna, **kwargs)
917 kwargs = self._validate_format(format, kwargs)
918 self._write_to_group(key, value, append=append, dropna=dropna,
--> 919 **kwargs)
920
921 def append_to_multiple(self, d, value, selector, data_columns=None,
/home/kartik/miniconda3/lib/python3.5/site-packages/pandas/io/pytables.py in _write_to_group(self, key, value, format, index, append, complib, encoding, **kwargs)
1262
1263 # write the object
-> 1264 s.write(obj=value, append=append, complib=complib, **kwargs)
1265
1266 if s.is_table and index:
/home/kartik/miniconda3/lib/python3.5/site-packages/pandas/io/pytables.py in write(self, obj, data_columns, **kwargs)
4195 data_columns.insert(0, n)
4196 return super(AppendableMultiFrameTable, self).write(
-> 4197 obj=obj, data_columns=data_columns, **kwargs)
4198
4199 def read(self, **kwargs):
/home/kartik/miniconda3/lib/python3.5/site-packages/pandas/io/pytables.py in write(self, obj, axes, append, complib, complevel, fletcher32, min_itemsize, chunksize, expectedrows, dropna, **kwargs)
3785 self.create_axes(axes=axes, obj=obj, validate=append,
3786 min_itemsize=min_itemsize,
-> 3787 **kwargs)
3788
3789 for a in self.axes:
/home/kartik/miniconda3/lib/python3.5/site-packages/pandas/io/pytables.py in create_axes(self, axes, obj, validate, nan_rep, data_columns, min_itemsize, **kwargs)
3383 axis, axis_labels = self.non_index_axes[0]
3384 data_columns = self.validate_data_columns(
-> 3385 data_columns, min_itemsize)
3386 if len(data_columns):
3387 mgr = block_obj.reindex_axis(
/home/kartik/miniconda3/lib/python3.5/site-packages/pandas/io/pytables.py in validate_data_columns(self, data_columns, min_itemsize)
3246 if info.get('type') == 'MultiIndex' and data_columns:
3247 raise ValueError("cannot use a multi-index on axis [{0}] with "
-> 3248 "data_columns {1}".format(axis, data_columns))
3249
3250 # evaluate the passed data_columns, True == use all columns
ValueError: cannot use a multi-index on axis [1] with data_columns ['level_1', 'level_0']
我最有意义的是存储这样的数据。主要是因为我的需求会有很大差异。对于某些应用程序,我将需要所有行和所有列。对于其他许多人来说,我只需要所有行和一个父列:说我需要foo
下的所有行。我可能还只需要一个父行和一个父列:Foo, foo
。
我当然需要所有辅助行和列。
在我的例子中,主行索引是States,辅助行索引是传感器名称,主列索引是感测到的不同内容,辅助列索引是传感器输出的统计信息。因此,很容易看出我可能只需要一种类型的感知数据用于所有状态,或仅用于一种状态,或者我可能需要从一个州或所有州感知到的所有事物。
我正在寻找修复错误或更好的方法来存储数据。
答案 0 :(得分:1)
如果您在存储format='fixed'
时强制执行df
,则可以保留您的MultiIndex:
with pd.HDFStore('test.h5', 'w') as store:
store.put('output', df, format='fixed')
print store['output']
foo bar
one two three one two three
Foo One 0.9626 0.9761 0.4385 0.2976 0.0882 0.7589
Two 0.7842 0.7563 0.4796 0.5664 0.1511 0.9345
Three 0.3364 0.4271 0.4107 0.9009 0.5207 0.4082
Bar One 0.9892 0.4595 0.1485 0.1456 0.9935 0.1386
Two 0.3187 0.7908 0.2947 0.7354 0.5759 0.9102
Three 0.0499 0.1865 0.8113 0.4815 0.1427 0.3322
但是你会失去一些功能(例如,使用.append()
方法)。根据您的需要,这可能是也可能不是问题。
答案 1 :(得分:0)
我最终堆叠了DataFrame一次,以便在存储之前删除列中的MultiIndex:
with pd.HDFStore('test.h5', 'w') as store:
store.append('output', df.stack().astype(float))
这不是理想的,但HDF5表的querying功能可以补偿它。