我有一个赛季篮球得分的数据框,我想找到他们在赛季中每场比赛每场比赛之间的天数。
示例框架:
testDateFrame = pd.DataFrame({'HomeTeam': ['HOU', 'CHI', 'DAL', 'HOU'],
'AwayTeam' : ['CHI', 'DAL', 'CHI', 'DAL'],
'HomeGameNum': [1, 2, 2, 2],
'AwayGameNum' : [1, 1, 3, 3],
'Date' : [datetime.date(2014,3,11), datetime.date(2014,3,12), datetime.date(2014,3,14), datetime.date(2014,3,15)]})
我想要的输出是:
AwayGameNum AwayTeam Date HomeGameNum HomeTeam AwayRest HomeRest
1 CHI 2014-03-11 1 HOU nan nan
1 DAL 2014-03-12 2 CHI nan 0
3 CHI 2014-03-14 2 DAL 1 1
3 DAL 2014-03-15 2 HOU 0 3
AwayRest,HomeRest列是AwayTeam游戏之间的天数,HomeTeam -1
答案 0 :(得分:4)
我会稍微调整一下你的数据布局,以便它符合Hadley Wickhams对Tidy Data的定义。这使得计算更加简单。删除AwayTeam
和HomeTeam
的列,并使用Team
创建一个列。然后创建一个布尔列(HomeTeam
),以确定该团队是否为主队。
注意:我没有更改AwayGameNum
和HomeGameNum
,因此数字与您想要的输出不符。但该方法可行。
In [34]: df
Out[34]:
AwayGameNum Team Date HomeGameNum HomeTeam
0 1 CHI 2014-03-11 1 False
1 1 HOU 2014-03-11 1 True
2 1 DAL 2014-03-12 2 False
3 1 CHI 2014-03-12 2 True
4 3 CHI 2014-03-14 2 False
5 3 DAL 2014-03-14 2 True
6 3 DAL 2014-03-15 2 False
7 3 HOU 2014-03-15 2 True
[8 rows x 5 columns]
In [62]: rest = df.groupby(['Team'])['Date'].diff() - datetime.timedelta(1)
In [63]: df['HomeRest'] = rest[df.HomeTeam]
In [64]: df['AwayRest'] = rest[~df.HomeTeam]
In [65]: df
Out[65]:
AwayGameNum Team Date HomeGameNum HomeTeam HomeRest AwayRest
0 1 CHI 2014-03-11 1 False NaT NaT
1 1 HOU 2014-03-11 1 True NaT NaT
2 1 DAL 2014-03-12 2 False NaT NaT
3 1 CHI 2014-03-12 2 True 0 days NaT
4 3 CHI 2014-03-14 2 False NaT 1 days
5 3 DAL 2014-03-14 2 True 1 days NaT
6 3 DAL 2014-03-15 2 False NaT 0 days
7 3 HOU 2014-03-15 2 True 3 days NaT
[8 rows x 7 columns]