将熊猫数据框转换为特定的时间序列结构

时间:2020-01-12 22:39:55

标签: python pandas dataframe time-series

我有一个带有棒球数据的熊猫数据框,像这样:

 data = [['1','2006', 10], ['1','2007', 8], ['1','2008', 14],['2','2010', 54], ['2','2011', 50], ['2','2012', 14]] 
 df = pd.DataFrame(data, columns = ['player_id', 'year','homeruns'])

我的目标是转换此数据框,以便每行具有player_id,t年,t年本垒打,t-1年本垒打和t + 1年本垒打,并为t年拥有所有可能的行在df中存在t-1,t和t + 1。

在我的示例中,我的输出将是:

data_output = [['1','2007', 8,10,14], ['2','2011', 50,54,14]]
df_output = pd.DataFrame(dataoutput, columns = ['player_id','year_t','homeruns_t','homeruns_t_minus_1', 'homeruns_t_plus_1'])

有什么好办法吗?这是任何python时间序列包的一部分吗?

1 个答案:

答案 0 :(得分:0)

如果我对您的理解正确,那么您希望本年度的年份,年份减去一年,年份再加上player_id的本垒打(假设数据按年份排序):

data = [['1','2006', 10], ['1','2007', 8], ['1','2008', 14],['2','2010', 54], ['2','2011', 50], ['2','2012', 14]]

df = pd.DataFrame(data, columns = ['player_id', 'year','homeruns'])

df['homeruns_t_minus_1'] = df.groupby(['player_id'])['homeruns'].shift()
df['homeruns_t_plus_1'] = df.groupby(['player_id'])['homeruns'].shift(-1)
print( df[~(df['homeruns_t_minus_1'].isna() | df['homeruns_t_plus_1'].isna())].astype(int) )

打印:

   player_id  year  homeruns  homeruns_t_minus_1  homeruns_t_plus_1
1          1  2007         8                  10                 14
4          2  2011        50                  54                 14