我有一个pandas DataFrame,其中包含2015-2016赛季至2019-2020赛季的NFL四分卫数据。 DataFrame看起来像这样
Player Season End Year YPG TD
Tom Brady 2019 322.6 25
Tom Brady 2018 308.1 26
Tom Brady 2017 295.7 24
Tom Brady 2016 308.7 28
Aaron Rodgers 2019 360.4 30
Aaron Rodgers 2018 358.8 33
Aaron Rodgers 2017 357.9 35
Aaron Rodgers 2016 355.2 32
我希望能够创建新列,其中包含我选择的年份数据和最近三年的数据。例如,如果我选择的年份是2019,则结果DataFrame为(SY代表所选年份:
Player Season End Year YPG SY YPG SY-1 YPG SY-2 YPG SY-3 TD
Tom Brady 2019 322.6 308.1 295.7 308.7 25
Aaron Rodgers 2019 360.4 358.8 357.9 355.2 30
这就是我要尝试的方式:
NFL_Data.loc[NFL_Data['Season End Year'] == (NFL_Data['SY']), 'YPG SY'] = NFL_Data['YPG']
NFL_Data.loc[NFL_Data['Season End Year'] == (NFL_Data['SY']-1), 'YPG SY-1'] = NFL_Data['YPG']
NFL_Data.loc[NFL_Data['Season End Year'] == (NFL_Data['SY']-2), 'YPG SY-2'] = NFL_Data['YPG']
NFL_Data.loc[NFL_Data['Season End Year'] == (NFL_Data['SY']-3), 'YPG SY-3'] = NFL_Data['YPG']
但是,当我运行上面的代码时,它没有正确填写各列。大多数行都是0。我是用正确的方法解决问题还是有更好的方法来解决问题?
(已编辑,以包含TD列)
答案 0 :(得分:3)
第一步是旋转数据框架。
pivoted = df.pivot_table(index='Player', columns='Season End Year', values='YPG')
哪个产量
Season End Year 2016 2017 2018 2019
Player
Aaron Rodgers 355.2 357.9 358.8 360.4
Tom Brady 308.7 295.7 308.1 322.6
然后,您可以选择:
pivoted.loc[:, range(year, year-3, -1)]
2019 2018 2017
Player
Aaron Rodgers 360.4 358.8 357.9
Tom Brady 322.6 308.1 295.7
或者由Quang建议:
pivoted.loc[:, year:year-3:-1]