DataFrame-如何计算条件滚动总和?

时间:2020-07-19 11:38:04

标签: python dataframe

我有一个包含Football数据的DataFrame,其中每一行代表一个比赛。 DataFrame包括以下几列:“日期”,“ HomeTeam”,“ AwayTeam”,“ Points_HomeTeam”,“ Points_AwayTeam”。

+--------------------------------------------------------------------------+
| 'Date'    'HomeTeam'   'AwayTeam'  'Points_HomeTeam' 'Points_AwayTeam'   |
+--------------------------------------------------------------------------+
| 2000-08-19 Charlton     Man City          0                 3            |
| 2000-08-19 Chelsea      Arsenal           1                 1            |
| 2000-08-23 Coventry     Man City          3                 0            |
| 2000-08-25 Man City     Liverpool         1                 1            |
| 2000-08-28 Derby        Man City          1                 1            |
| 2000-08-31 Leeds        Chelsea           3                 0            |
| 2000-08-31 Man City     Everton           3                 0            |
+--------------------------------------------------------------------------+

我想添加一列,该列显示HomeTeam在其最近的两个客场比赛中的得分之和,即前两个行实例的“ Points_AwayTeam”列中的值之和。 “ AwayTeam”等于相应当前行的“ HomeTeam”。

例如,在下表中,“ HomeTeam”列中首次出现“ Man City”的新列将具有值“ 3”(前一个列的“ Points_AwayTeam”列中的值之和在“ AwayTeam”列中出现了两次“ Man City”,即0 + 3) 同样,在“ HomeTeam”列中第二次出现“ Man City”的新列将具有值“ 1”(1 + 0)。 其他行的值将为“ NA”,因为列“ AwayTeam”中没有其他“ HomeTeam”出现两次。

+-------------------------------------------------------------------------------------+
| 'Date'    'HomeTeam'   'AwayTeam'  'Points_HomeTeam' 'Points_AwayTeam' 'New Column' |
+-------------------------------------------------------------------------------------+
| 2000-08-19 Charlton     Man City          0                 3          NA           |
| 2000-08-19 Chelsea      Arsenal           1                 1          NA           |
| 2000-08-23 Coventry     Man City          3                 0          NA           |
| 2000-08-25 Man City     Liverpool         1                 1          3            |
| 2000-08-28 Derby        Man City          1                 1          NA           |
| 2000-08-31 Leeds        Chelsea           3                 0          NA           |
| 2000-08-31 Man City     Everton           3                 0          1            |
+-------------------------------------------------------------------------------------+

我设法使用以下代码计算了“ HomeTeam”在最近两个主场比赛中的积分之和:

f = lambda x: x.rolling(window = rolling_games, min_periods = rolling_games).sum().shift()
df['HomeTeam_HomePoints'] = df.groupby('HomeTeam')['Points_HomeTeam'].apply(f).reset_index(drop = True, level = 0)

如何根据独立列中的值计算行的滚动总和?

非常感谢!

1 个答案:

答案 0 :(得分:0)

这是一个解决方案:

away = df[["Date", "AwayTeam", "Points_AwayTeam"]].copy()

# Create a rolling sum for the away column. 
away["roll_sum"] = away.groupby("AwayTeam")["Points_AwayTeam"].transform(lambda x: x.rolling(2).sum())
    
# for every match, we now have to find the last rolling sum 
# of 'away' for the 'home' team. 
# 
# We're going to use merge_asof to do that:
# The first step of this function is to match home-teams on the left
# to away teams on the left. (done via left_by and right_by)
# then, for every date on the left, we're looking for the closest 
# (previous) date on the right (this is done by the 'on' argument). 
res=pd.merge_asof(df, away, on= "Date", left_by="HomeTeam", right_by="AwayTeam", suffixes=["", "_roll"])
res.drop(["AwayTeam_roll", "Points_AwayTeam_roll"], axis=1, inplace = True)
print(res)

输出:

        Date  HomeTeam   AwayTeam  Points_HomeTeam  Points_AwayTeam  roll_sum
0 2000-08-19  Charlton   Man-City                0                3       NaN
1 2000-08-19   Chelsea    Arsenal                1                1       NaN
2 2000-08-23  Coventry   Man-City                3                0       NaN
3 2000-08-25  Man-City  Liverpool                1                1       3.0
4 2000-08-28     Derby   Man-City                1                1       NaN
5 2000-08-31     Leeds    Chelsea                3                0       NaN
6 2000-08-31  Man-City    Everton                3                0       1.0