Question

我正在处理包含超过54000行的以下数据框：

NBA daily fantasy dataframe

我要为数据帧的每一列添加特定玩家对特定对手的平均“ Draft_Kings_Points_Scored”。我已经在Python和SQL中都尝试过了，似乎无法弄清楚。如果您知道这样做的方法，非常感谢您的帮助。

Answer 1

您可以将数据转换为时间序列，然后使用熊猫：

groupby().rolling().mean()

这是代码。首先，制作一些数据进行测试：

import pandas as pd
import numpy as np
import string
from datetime import datetime

# set up matches, players,  tournament start and end
matches = 50000
players = list(string.ascii_uppercase)
start = datetime(2015, 1, 1).timestamp()
end = datetime(2018, 1, 1).timestamp()

# create a dataframe for testing
df = pd.DataFrame({
    'DATE': pd.to_datetime(np.random.randint(start, end, size=matches), unit='s'),
    'PLAYER': np.random.choice(players, matches),
    'OPPONENT': np.random.choice(players, matches),
    'SCORE': np.random.normal(100, 25, matches)
    })

# drop the cases where the player played themselve
df = df[df['PLAYER'] != df['OPPONENT']]

# make it a time series and ensure it is sorted
df.set_index('DATE', inplace=True)
df.sort_index(inplace=True)

df.head()

在其上使用groupby（）。rolling（）。mean（）：

df_rolling = df.groupby(['PLAYER', 'OPPONENT']).rolling(3).mean().reset_index()
df_rolling.head()

将其重新加入到具有所有列的原始数据中，并检查一个对位（A或B）

df_final = pd.merge(df, df_rolling, on=['PLAYER', 'OPPONENT', 'DATE'], suffixes=['_RAW', '_AVG3'])
df_final[df_final['PLAYER'].eq('A') & df_final['OPPONENT'].eq('B')].tail(10)

从熊猫数据框中追加最近三次出现的平均值

1 个答案: