我有像这样的Pandas数据框
from datetime import timedelta
import pandas as pd
df = pd.DataFrame({'Team':pd.np.random.choice(['CHI', 'DAL'], 20),
'Date':pd.date_range('2014-11-01', '2014-11-20')})
df.drop(14, inplace=True)
df
Date Team
0 2014-11-01 DAL
1 2014-11-02 CHI
2 2014-11-03 CHI
3 2014-11-04 DAL
4 2014-11-05 CHI
5 2014-11-06 CHI
6 2014-11-07 DAL
7 2014-11-08 DAL
8 2014-11-09 DAL
9 2014-11-10 DAL
10 2014-11-11 CHI
11 2014-11-12 CHI
12 2014-11-13 CHI
13 2014-11-14 CHI
# Notice there is no day here.
15 2014-11-16 CHI
16 2014-11-17 CHI
17 2014-11-18 CHI
18 2014-11-19 CHI
19 2014-11-20 DAL
我想找到一支球队连续比赛的天数。
答案 0 :(得分:1)
以下内容应该更加优化,基本上我是团队中的groupby
,应用布尔测试来判断日期时间的差异是否等于1天的时间值。
然后,如果这是True,则对此应用cumsum
并添加1。
然后填写NaN
值:
In [51]:
df['consec_days'] = df.sort('Date').groupby('Team')['Date'].apply(lambda x: x.diff() == dt.timedelta(1))
df.loc[df['consec_days'] == True,'n_days'] = df.loc[df['consec_days']==True].groupby('Team')['consec_days'].apply(pd.Series.cumsum) + 1
df['n_days'] = df['n_days'].fillna(1)
df
Out[51]:
Date Team consec_days n_days
index
0 2014-11-01 DAL False 1
1 2014-11-02 CHI False 1
2 2014-11-03 DAL False 1
3 2014-11-04 CHI False 1
4 2014-11-05 DAL False 1
5 2014-11-06 DAL True 2
6 2014-11-07 DAL True 3
7 2014-11-08 DAL True 4
8 2014-11-09 CHI False 1
9 2014-11-10 DAL False 1
答案 1 :(得分:0)
对我的知识这样做的唯一方法是迭代。这不像矢量化函数那样最优,但由于你需要在n行之间传递信息,我不认为矢量化是可能的。
因此我提出了这个算法:
n_days_in_row_played = 1
last_team = ""
last_date = datetime(1,1,1)
n_days = []
for row in df[(['Date', 'Team'])].iterrows():
i, data = row
date, team = data
if team != last_team or (date - last_date).days > 1:
last_team = team
n_days_in_row_played = 1
else:
n_days_in_row_played += 1
n_days.append(n_days_in_row_played)
last_date = date
df['n_days'] = n_days
df
Date Team n_days
0 2014-11-01 DAL 1
1 2014-11-02 CHI 1
2 2014-11-03 CHI 2
3 2014-11-04 DAL 1
4 2014-11-05 CHI 1
5 2014-11-06 CHI 2
6 2014-11-07 DAL 1
7 2014-11-08 DAL 2
8 2014-11-09 DAL 3
9 2014-11-10 DAL 4
10 2014-11-11 CHI 1
11 2014-11-12 CHI 2
12 2014-11-13 CHI 3
13 2014-11-14 CHI 4
# Skipped day resets the count.
15 2014-11-16 CHI 1
16 2014-11-17 CHI 2
17 2014-11-18 CHI 3
18 2014-11-19 CHI 4
19 2014-11-20 DAL 1
我们记住最后一支球队是什么,最后一次比赛是
对于每一个新行,我们会比较团队是否更改或者是否超过一天,那么天数就会被打破,所以我们会重置。
否则我们在连续几天内加上一个加号
最后,我们将播放日期的值附加到我们可以附加到原始数据框的列表中。