以下是我的数据示例:
In[177]:df_data[['Date', 'TeamName', 'Opponent', 'ScoreOff']].head()
Out[177]:
Date TeamName Opponent ScoreOff
4128 2005-09-08 00:00:00 New England Patriots Oakland Raiders 30
4129 2005-09-08 00:00:00 Oakland Raiders New England Patriots 20
4130 2005-09-11 00:00:00 Arizona Cardinals New York Giants 19
4131 2005-09-11 00:00:00 Baltimore Ravens Indianapolis Colts 7
4132 2005-09-11 00:00:00 Buffalo Bills Houston Texans 22
对于每一行,我需要设置一个新列[' OpponentScoreOff']等于当天该团队的对手ScoreOff。
我已经基本上做了以下这样做,但它很慢,我觉得有更多的pythonic / vectorized方式来做它。
g1 = df_data.groupby('Date')
for date, teams in g1:
g2 = teams.groupby('TeamName')
for teamname, game in teams:
df_data[(df_data['TeamName'] == teamname) & (dfdata['Date'] == date)]['OppScoreOff'] = df_data[(df_data['Opponent'] == teamname) & (df_data['Date'] == date)]['ScoreOff']
虽然有效,但速度很慢。有没有更好的方法呢?
答案 0 :(得分:0)
您可以使用sort
在任何指定日期利用TeamName和Opponent之间的双向投放。请考虑以下事项:
import pandas as pd
import numpy as np
df_data = df_data.sort(['Date', 'TeamName'])
opp_score = np.array(df_data.sort(['Date', 'Opponent'])['ScoreOff'])
df_data['OpponentScoreOff'] = opp_score
数组调用是删除DataFrame索引所必需的。这样,一旦它被放回df_data
,阵列就不会使用。