我想建立一个多索引数据名人堂,但失败了:
Shape of passed values is (3, 4), indices imply (3, 2)
代码:
import pandas as pd
df = pd.DataFrame({
'foo': [1,2,3], 'bar':[4,5,6], 'dt':['2020-01-01', '2020-01-01', '2020-01-02'], 'cat':['a', 'b', 'b']
})
df = df.groupby(['dt', 'cat']).describe().loc[:, pd.IndexSlice[:, ['count', '50%']]].reset_index()
columns_of_interest = sorted(df.drop(['dt', 'cat'], axis=1, level=0).columns.get_level_values(0).unique())
df.pivot(index='dt', columns='cat', values=columns_of_interest)
如何解决?
预期结果:
来自:
dt cat foo bar
count 50% count 50%
0 2020-01-01 a 1.0 1.0 1.0 4.0
1 2020-01-01 b 1.0 2.0 1.0 5.0
2 2020-01-02 b 1.0 3.0 1.0 6.0
收件人:
value foo bar
cat a b a b
dt
0
1
2
基本上我想计算:
v = 'count'
df['foo'][v].reset_index().pivot(index='dt', columns='cat', values = v)
对于每个列[foo, bar]
和每个聚合[count, 50%]
,并返回单个合并结果。
即:
for c in columns_of_interest:
print(c)
for piv in piv_values:
print(piv)
r = df[c][piv].reset_index().pivot(index='dt', columns='cat', values = piv)
display(r)
1)我只是不确定如何重新组合结果,以及2)如何找到一个整洁的解决方案。
一个相当巧妙的解决方法是将级别展平:
df.columns = ['_'.join(col).strip() for col in df.columns.values]
columns_of_interest = df.columns
df.reset_index().pivot(index='dt', columns='cat', values=columns_of_interest)
答案 0 :(得分:1)
IIUC,您可以在unstack
之后使用groupby
(无reset_index):
df = pd.DataFrame({
'foo': [1,2,3], 'bar':[4,5,6],
'dt':['2020-01-01', '2020-01-01', '2020-01-02'], 'cat':['a', 'b', 'b']
})
df_ = df.groupby(['dt', 'cat']).describe()\
.loc[:, pd.IndexSlice[:, ['count', '50%']]]\
.unstack() # unstack instead of reset_index
print (df_)
foo bar
count 50% count 50%
cat a b a b a b a b
dt
2020-01-01 1.0 1.0 1.0 2.0 1.0 1.0 4.0 5.0
2020-01-02 NaN 1.0 NaN 3.0 NaN 1.0 NaN 6.0