我正在处理包含超过54000行的以下数据框:
我要为数据帧的每一列添加特定玩家对特定对手的平均“ Draft_Kings_Points_Scored”。我已经在Python和SQL中都尝试过了,似乎无法弄清楚。如果您知道这样做的方法,非常感谢您的帮助。
答案 0 :(得分:0)
您可以将数据转换为时间序列,然后使用熊猫:
groupby().rolling().mean()
这是代码。首先,制作一些数据进行测试:
import pandas as pd
import numpy as np
import string
from datetime import datetime
# set up matches, players, tournament start and end
matches = 50000
players = list(string.ascii_uppercase)
start = datetime(2015, 1, 1).timestamp()
end = datetime(2018, 1, 1).timestamp()
# create a dataframe for testing
df = pd.DataFrame({
'DATE': pd.to_datetime(np.random.randint(start, end, size=matches), unit='s'),
'PLAYER': np.random.choice(players, matches),
'OPPONENT': np.random.choice(players, matches),
'SCORE': np.random.normal(100, 25, matches)
})
# drop the cases where the player played themselve
df = df[df['PLAYER'] != df['OPPONENT']]
# make it a time series and ensure it is sorted
df.set_index('DATE', inplace=True)
df.sort_index(inplace=True)
df.head()
在其上使用groupby()。rolling()。mean():
df_rolling = df.groupby(['PLAYER', 'OPPONENT']).rolling(3).mean().reset_index()
df_rolling.head()
将其重新加入到具有所有列的原始数据中,并检查一个对位(A或B)
df_final = pd.merge(df, df_rolling, on=['PLAYER', 'OPPONENT', 'DATE'], suffixes=['_RAW', '_AVG3'])
df_final[df_final['PLAYER'].eq('A') & df_final['OPPONENT'].eq('B')].tail(10)