根据年份创建新的DataFrame列

时间:2019-12-02 21:11:54

标签: python-3.x pandas

我有一个pandas DataFrame,其中包含2015-2016赛季至2019-2020赛季的NFL四分卫数据。 DataFrame看起来像这样

Player             Season End Year       YPG         TD      
Tom Brady            2019               322.6        25 
Tom Brady            2018               308.1        26
Tom Brady            2017               295.7        24
Tom Brady            2016               308.7        28
Aaron Rodgers        2019               360.4        30
Aaron Rodgers        2018               358.8        33 
Aaron Rodgers        2017               357.9        35
Aaron Rodgers        2016               355.2        32

我希望能够创建新列,其中包含我选择的年份数据和最近三年的数据。例如,如果我选择的年份是2019,则结果DataFrame为(SY代表所选年份:

Player          Season End Year      YPG SY             YPG SY-1      YPG SY-2     YPG SY-3      TD     
Tom Brady           2019              322.6               308.1         295.7        308.7       25
Aaron Rodgers       2019              360.4               358.8         357.9        355.2       30

这就是我要尝试的方式:

NFL_Data.loc[NFL_Data['Season End Year'] == (NFL_Data['SY']), 'YPG SY'] = NFL_Data['YPG']
NFL_Data.loc[NFL_Data['Season End Year'] == (NFL_Data['SY']-1), 'YPG SY-1'] = NFL_Data['YPG']
NFL_Data.loc[NFL_Data['Season End Year'] == (NFL_Data['SY']-2), 'YPG SY-2'] = NFL_Data['YPG']
NFL_Data.loc[NFL_Data['Season End Year'] == (NFL_Data['SY']-3), 'YPG SY-3'] = NFL_Data['YPG']

但是,当我运行上面的代码时,它没有正确填写各列。大多数行都是0。我是用正确的方法解决问题还是有更好的方法来解决问题?

(已编辑,以包含TD列)

1 个答案:

答案 0 :(得分:3)

第一步是旋转数据框架。

pivoted = df.pivot_table(index='Player', columns='Season End Year', values='YPG')

哪个产量

Season End Year   2016   2017   2018   2019
Player                                     
Aaron Rodgers    355.2  357.9  358.8  360.4
Tom Brady        308.7  295.7  308.1  322.6

然后,您可以选择:

pivoted.loc[:, range(year, year-3, -1)]

                  2019   2018   2017
Player                              
Aaron Rodgers    360.4  358.8  357.9
Tom Brady        322.6  308.1  295.7

或者由Quang建议:

pivoted.loc[:, year:year-3:-1]