Pandas:将多个MultiColumns设置为MultiIndex

时间:2016-08-01 13:53:23

标签: python pandas

我按如下方式生成一个空数据框:

topFields = ['desc', 'desc', 'price', 'price', 'units', 'units']
bottomFields = ['foo', 'bar', 'mean', 'mom_2', 'mean', 'mom_2']
resultsDf = pd.DataFrame(columns=pd.MultiIndex.from_arrays([topFields, bottomFields]))

现在我想将前两列(desc作为顶级值)设置为索引(,并作为更常见的挑战,所有列{ {1}}作为顶级值)。我尝试了几种方法,但都没有效果。

这是最直观的(失败):

desc

>>> test = resultsDf.set_index('desc') >>> test Out[4]: Empty DataFrame Columns: [(price, mean), (price, mom_2), (units, mean), (units, mom_2)] Index: [] >>> test.index Out[5]: Index([], dtype='object', name='desc') 正确删除了pandas列(来自“列”),但这些列都没有出现在索引中。相反,我在索引中只有一个字段。当我尝试基于MultiIndex创建一行时,出现错误:

desc

1 个答案:

答案 0 :(得分:1)

看起来像元组需要set_index

test = resultsDf.set_index(('desc', 'foo'))
print (test)
Empty DataFrame
Columns: [(desc, bar), (price, mean), (price, mom_2), (units, mean), (units, mom_2)]
Index: []

print (test.index)
Index([], dtype='object', name=('desc', 'foo'))

或者也许:

test = resultsDf.set_index([('desc', 'foo'), ('desc', 'bar')])
print (test)
Columns: [(price, mean), (price, mom_2), (units, mean), (units, mom_2)]
Index: []

print (test.index)
MultiIndex(levels=[[], []],
           labels=[[], []],
           names=[('desc', 'foo'), ('desc', 'bar')])