我有这个数据框:
person code year Height Size ...
0 73163529000108 2013 6.293900e+07 6.292900e+07
1 73163529000108 2012 5.206400e+07 5.282500e+07
2 73163529000108 2014 7.293900e+07 5.292900e+07
3 68402163000134 2013 3.225900e+07 2.389000e+06
4 68402163000134 2012 5.779300e+07 5.304800e+07
...
我希望包括一个" Height Year Growth"和"规模年增长"列,所以它看起来像这样:
person code year Height Height Y Growth Size ...
0 73163529000108 2013 6.293900e+07 0.2096 6.292900e+07
1 73163529000108 2012 5.206400e+07 5.282500e+07
2 73163529000108 2014 7.293900e+07 0,1589 5.292900e+07
3 68402163000134 2013 3.225900e+07 2.389000e+06
4 68402163000134 2012 5.779300e+07 -0.4419 5.304800e+07
...
我不介意它出来的格式,我只需要它可扩展。我很难接受它。有人可以建议替代方案吗?
答案 0 :(得分:4)
您正在寻找pct_change
df[['YC','SC']]=df.sort_values(['year']).groupby('personcode')[['Height','Size']].pct_change()
df
Out[1083]:
personcode year Height Size YC SC
0 73163529000108 2013 6.2939 6.2929 0.208878 0.191273
1 73163529000108 2012 5.2064 5.2825 NaN NaN
2 73163529000108 2014 7.2939 5.2929 0.158884 -0.158909
3 68402163000134 2013 3.2259 2.3890 -0.441818 -0.549653
4 68402163000134 2012 5.7793 5.3048 NaN NaN