Question

我正在使用一些数据，在这里我希望在最近的运行中获得每匹马的排名（finishing position）（在运行之前最多运行6次）。运行日期定义为'race_id'。

有没有办法使用groupby和agg但只聚合之前的6个值？

数据框如下：

finishing_position  horse_id    race_id
 1                  K01         2014011
 2                  K02         2014011
 3                  M01         2014011
 4                  K01         2014012
 2                  K01         2014021
 3                  K01         2014031
 1                  M01         2015011
 2                  K01         2016012
 1                  K02         2016012
 3                  M01         2016012
 4                  J01         2016012

我希望结果是

finishing_position  horse_id    race_id     recent
 1                  K01         2014011
 2                  K02         2014011
 3                  M01         2014011
 4                  K01         2014012     1
 2                  K01         2014021     1/4
 3                  K01         2014031     1/4/2
 1                  M01         2015011     3
 2                  K01         2016012     1/4/2/3
 1                  K02         2016012     2
 3                  M01         2016012     3/1
 4                  J01         2016012

Answer 1

我们可以将cumsum与groupby

一起使用

df['recent']=df.finishing_position.astype(str)+'/'
df['recent']=df.groupby('horse_id').recent.apply(lambda x : x.cumsum().shift().str[:-1].fillna(''))
df
Out[140]: 
    finishing_position horse_id  race_id   recent
0                    1      K01  2014011         
1                    2      K02  2014011         
2                    3      M01  2014011         
3                    4      K01  2014012        1
4                    2      K01  2014021      1/4
5                    3      K01  2014031    1/4/2
6                    1      M01  2015011        3
7                    2      K01  2016012  1/4/2/3
8                    1      K02  2016012        2
9                    3      M01  2016012      3/1
10                   4      J01  2016012

Answer 2

修订@Wen回答以获得最多只有N个以前的记录。

df['recent']=df.finishing_position.astype(str)+'/'
df['recent']=df.groupby('horse_id').recent.apply(lambda x : x.cumsum().shift().str[:-1].fillna(''))

def last_n_record(string,recent_no):
    count = string.count('/')
    if count+1 >= recent_no:
       return string.split('/',count - recent_no + 1)[-1]
    else:
       return string

recent_no = 3 # Lets take 3 recent records as demo
df['recent'] = df.recent.apply(lambda x: last_n_record(x,recent_no))
df
    finishing_position horse_id  race_id recent
0                    1      K01  2014011       
1                    2      K02  2014011       
2                    3      M01  2014011       
3                    4      K01  2014012      1
4                    2      K01  2014021    1/4
5                    3      K01  2014031  1/4/2
6                    1      M01  2015011      3
7                    2      K01  2016012  4/2/3
8                    1      K02  2016012      2
9                    3      M01  2016012    3/1
10                   4      J01  2016012

pandas数据框聚合固定行数

2 个答案: