我有一个带有多列的DF,我希望从行转换为列大多数我在堆栈溢出时看到的解决方案只处理2列
来自DF
PO ID PO Name Region Date Price
1 AA North 07/2016 100
2 BB South 07/2016 200
1 AA North 08/2016 300
2 BB South 08/2016 400
1 AA North 09/2016 500
到DF
PO ID PO Name Region 07/2016 08/2016 09/2016
1 AA North 100 300 500
2 BB South 200 400 NaN
答案 0 :(得分:4)
df = df.set_index(['PO ID','PO Name','Region', 'Date'])['Price'].unstack()
print (df)
Date 07/2016 08/2016 09/2016
PO ID PO Name Region
1 AA North 100.0 300.0 500.0
2 BB South 200.0 400.0 NaN
如果重复项需要使用pivot_table
或groupby
的汇总功能:
print (df)
PO ID PO Name Region Date Price
0 1 AA North 07/2016 100 <-for PO ID;PO Name;Region;Date different Price
1 1 AA North 07/2016 500 <-for PO ID;PO Name;Region;Date different Price
2 2 BB South 07/2016 200
3 1 AA North 08/2016 300
4 2 BB South 08/2016 400
5 1 AA North 09/2016 500
df = df.pivot_table(index=['PO ID','PO Name','Region'],
columns='Date',
values='Price',
aggfunc='mean')
print (df)
Date 07/2016 08/2016 09/2016
PO ID PO Name Region
1 AA North 300.0 300.0 500.0 <-(100+500)/2=300 for 07/2016
2 BB South 200.0 400.0 NaN
df = df.groupby(['PO ID','PO Name','Region', 'Date'])['Price'].mean().unstack()
print (df)
Date 07/2016 08/2016 09/2016
PO ID PO Name Region
1 AA North 300.0 300.0 500.0 <-(100+500)/2=300 for 07/2016
2 BB South 200.0 400.0 NaN
最后:
df = df.reset_index().rename_axis(None).rename_axis(None, axis=1)
print (df)
PO ID PO Name Region 07/2016 08/2016 09/2016
0 1 AA North 300.0 300.0 500.0
1 2 BB South 200.0 400.0 NaN