Question

我有一个带有棒球数据的熊猫数据框，像这样：

 data = [['1','2006', 10], ['1','2007', 8], ['1','2008', 14],['2','2010', 54], ['2','2011', 50], ['2','2012', 14]] 
 df = pd.DataFrame(data, columns = ['player_id', 'year','homeruns'])

我的目标是转换此数据框，以便每行具有player_id，t年，t年本垒打，t-1年本垒打和t + 1年本垒打，并为t年拥有所有可能的行在df中存在t-1，t和t + 1。

在我的示例中，我的输出将是：

data_output = [['1','2007', 8,10,14], ['2','2011', 50,54,14]]
df_output = pd.DataFrame(dataoutput, columns = ['player_id','year_t','homeruns_t','homeruns_t_minus_1', 'homeruns_t_plus_1'])

有什么好办法吗？这是任何python时间序列包的一部分吗？

Answer 1

如果我对您的理解正确，那么您希望本年度的年份，年份减去一年，年份再加上player_id的本垒打（假设数据按年份排序）：

data = [['1','2006', 10], ['1','2007', 8], ['1','2008', 14],['2','2010', 54], ['2','2011', 50], ['2','2012', 14]]

df = pd.DataFrame(data, columns = ['player_id', 'year','homeruns'])

df['homeruns_t_minus_1'] = df.groupby(['player_id'])['homeruns'].shift()
df['homeruns_t_plus_1'] = df.groupby(['player_id'])['homeruns'].shift(-1)
print( df[~(df['homeruns_t_minus_1'].isna() | df['homeruns_t_plus_1'].isna())].astype(int) )

打印：

   player_id  year  homeruns  homeruns_t_minus_1  homeruns_t_plus_1
1          1  2007         8                  10                 14
4          2  2011        50                  54                 14

将熊猫数据框转换为特定的时间序列结构

1 个答案: