从熊猫数据框中追加最近三次出现的平均值

时间:2018-07-03 18:07:40

标签: python sql pandas

我正在处理包含超过54000行的以下数据框:

NBA daily fantasy dataframe

我要为数据帧的每一列添加特定玩家对特定对手的平均“ Draft_Kings_Points_Scored”。我已经在Python和SQL中都尝试过了,似乎无法弄清楚。如果您知道这样做的方法,非常感谢您的帮助。

1 个答案:

答案 0 :(得分:0)

您可以将数据转换为时间序列,然后使用熊猫:

groupby().rolling().mean() 

这是代码。首先,制作一些数据进行测试:

import pandas as pd
import numpy as np
import string
from datetime import datetime

# set up matches, players,  tournament start and end
matches = 50000
players = list(string.ascii_uppercase)
start = datetime(2015, 1, 1).timestamp()
end = datetime(2018, 1, 1).timestamp()

# create a dataframe for testing
df = pd.DataFrame({
    'DATE': pd.to_datetime(np.random.randint(start, end, size=matches), unit='s'),
    'PLAYER': np.random.choice(players, matches),
    'OPPONENT': np.random.choice(players, matches),
    'SCORE': np.random.normal(100, 25, matches)
    })

# drop the cases where the player played themselve
df = df[df['PLAYER'] != df['OPPONENT']]

# make it a time series and ensure it is sorted
df.set_index('DATE', inplace=True)
df.sort_index(inplace=True)

df.head()

enter image description here

在其上使用groupby()。rolling()。mean():

df_rolling = df.groupby(['PLAYER', 'OPPONENT']).rolling(3).mean().reset_index()
df_rolling.head()

将其重新加入到具有所有列的原始数据中,并检查一个对位(A或B)

df_final = pd.merge(df, df_rolling, on=['PLAYER', 'OPPONENT', 'DATE'], suffixes=['_RAW', '_AVG3'])
df_final[df_final['PLAYER'].eq('A') & df_final['OPPONENT'].eq('B')].tail(10)

enter image description here