我按如下方式生成一个空数据框:
topFields = ['desc', 'desc', 'price', 'price', 'units', 'units']
bottomFields = ['foo', 'bar', 'mean', 'mom_2', 'mean', 'mom_2']
resultsDf = pd.DataFrame(columns=pd.MultiIndex.from_arrays([topFields, bottomFields]))
现在我想将前两列(desc
作为顶级值)设置为索引(,并作为更常见的挑战,所有列{ {1}}作为顶级值)。我尝试了几种方法,但都没有效果。
这是最直观的(失败):
desc
>>> test = resultsDf.set_index('desc')
>>> test
Out[4]:
Empty DataFrame
Columns: [(price, mean), (price, mom_2), (units, mean), (units, mom_2)]
Index: []
>>> test.index
Out[5]: Index([], dtype='object', name='desc')
正确删除了pandas
列(来自“列”),但这些列都没有出现在索引中。相反,我在索引中只有一个字段。当我尝试基于MultiIndex创建一行时,出现错误:
desc
答案 0 :(得分:1)
看起来像元组需要set_index
:
test = resultsDf.set_index(('desc', 'foo'))
print (test)
Empty DataFrame
Columns: [(desc, bar), (price, mean), (price, mom_2), (units, mean), (units, mom_2)]
Index: []
print (test.index)
Index([], dtype='object', name=('desc', 'foo'))
或者也许:
test = resultsDf.set_index([('desc', 'foo'), ('desc', 'bar')])
print (test)
Columns: [(price, mean), (price, mom_2), (units, mean), (units, mom_2)]
Index: []
print (test.index)
MultiIndex(levels=[[], []],
labels=[[], []],
names=[('desc', 'foo'), ('desc', 'bar')])