我有这种格式的数据 -
MonthYear HPI Div State_fips 1-1993 105.45 7 5 2-1993 105.58 7 5 3-1993 106.23 7 5 4-1993 106.63 7 5 Required Pivot Table as: Stafips 1-1993 2-1993 3-1993 4-1993 5 105.45 105.58 106.23 106.63
(熊猫新手)
答案 0 :(得分:1)
df1 = df.set_index(['State_fips', 'MonthYear'])['HPI'].unstack()
MonthYear 1-1993 2-1993 3-1993 4-1993
State_fips
5 105.45 105.58 106.23 106.63
df1 = df.pivot(index='State_fips', columns='MonthYear', values='HPI')
MonthYear 1-1993 2-1993 3-1993 4-1993
State_fips
5 105.45 105.58 106.23 106.63
但如果重复,需要与groupby
或pivot_table
合并,mean
可以更改为sum
,median
,...:
print (df)
MonthYear HPI Div State_fips
0 1-1993 105.45 7 5
1 2-1993 105.58 7 5
2 3-1993 106.23 7 5
3 4-1993 100.00 7 5 <-duplicates same 4-1993, 5
4 4-1993 200.00 7 5 <-duplicates same 4-1993, 5
df1 = df.pivot_table(index='State_fips', columns='MonthYear', values='HPI', aggfunc='mean')
MonthYear 1-1993 2-1993 3-1993 4-1993
State_fips
5 105.45 105.58 106.23 150.0 <- (100+200/2) = 150
df1 = df.groupby(['State_fips', 'MonthYear'])['HPI'].mean().unstack()
MonthYear 1-1993 2-1993 3-1993 4-1993
State_fips
5 105.45 105.58 106.23 150.0 <- (100+200/2) = 150
最后如果需要从索引创建列并删除列名称:
df1 = df1.reset_index().rename_axis(None, axis=1)
print (df1)
State_fips 1-1993 2-1993 3-1993 4-1993
0 5 105.45 105.58 106.23 150.0