我有一个带有棒球数据的熊猫数据框,像这样:
data = [['1','2006', 10], ['1','2007', 8], ['1','2008', 14],['2','2010', 54], ['2','2011', 50], ['2','2012', 14]]
df = pd.DataFrame(data, columns = ['player_id', 'year','homeruns'])
我的目标是转换此数据框,以便每行具有player_id,t年,t年本垒打,t-1年本垒打和t + 1年本垒打,并为t年拥有所有可能的行在df中存在t-1,t和t + 1。
在我的示例中,我的输出将是:
data_output = [['1','2007', 8,10,14], ['2','2011', 50,54,14]]
df_output = pd.DataFrame(dataoutput, columns = ['player_id','year_t','homeruns_t','homeruns_t_minus_1', 'homeruns_t_plus_1'])
有什么好办法吗?这是任何python时间序列包的一部分吗?
答案 0 :(得分:0)
如果我对您的理解正确,那么您希望本年度的年份,年份减去一年,年份再加上player_id
的本垒打(假设数据按年份排序):
data = [['1','2006', 10], ['1','2007', 8], ['1','2008', 14],['2','2010', 54], ['2','2011', 50], ['2','2012', 14]]
df = pd.DataFrame(data, columns = ['player_id', 'year','homeruns'])
df['homeruns_t_minus_1'] = df.groupby(['player_id'])['homeruns'].shift()
df['homeruns_t_plus_1'] = df.groupby(['player_id'])['homeruns'].shift(-1)
print( df[~(df['homeruns_t_minus_1'].isna() | df['homeruns_t_plus_1'].isna())].astype(int) )
打印:
player_id year homeruns homeruns_t_minus_1 homeruns_t_plus_1
1 1 2007 8 10 14
4 2 2011 50 54 14