每组最后n天的计数

时间:2016-01-21 15:42:33

标签: python datetime pandas

我有DataFrame这样的

df = pd.DataFrame({'Team':['CHI','IND','CHI','CHI','IND','CHI','CHI','IND'],
               'Date':[datetime.date(2015,10,27),datetime.date(2015,10,28),datetime.date(2015,10,29),datetime.date(2015,10,30),datetime.date(2015,11,1),datetime.date(2015,11,2),datetime.date(2015,11,4),datetime.date(2015,11,4)]})

我可以找到使用此游戏的游戏之间的休息天数。

df['TeamRest'] = df.groupby('Team')['Date'].diff() - datetime.timedelta(1)

我还想在DataFrame添加一行,跟踪每个团队在过去5天内玩了多少游戏。

1 个答案:

答案 0 :(得分:4)

Date转换为datetime,因此可以将DateTimeIndex用作rolling_count,这对daily frequency df.Date = pd.to_datetime(df.Date) 来说非常重要

df['days_between'] = df.groupby('Team')['Date'].diff() - timedelta(days=1)

1)计算每队比赛之间的天数差异:

df['game_count'] = 1
rolling_games_count = df.set_index('Date').groupby('Team').apply(lambda x: pd.rolling_count(x, window=5, freq='D')).reset_index()
df = df.drop('game_count', axis=1).merge(rolling_games_count, on=['Team', 'Date'], how='left')

2)计算每支队伍过去5天的比赛滚动次数:

        Date Team  days_between  game_count
0 2015-10-27  CHI           NaT           1
1 2015-10-28  IND           NaT           1
2 2015-10-29  CHI        1 days           2
3 2015-10-30  CHI        0 days           3
4 2015-11-01  IND        3 days           2
5 2015-11-02  CHI        2 days           3
6 2015-11-04  CHI        1 days           2
7 2015-11-04  IND        2 days           2

得到:

df = pd.DataFrame({'Team':['CHI','IND','CHI','CHI','IND','CHI','CHI','IND'], 'Date': [date(2015,10,27),date(2015,10,28),date(2015,10,29),date(2015,10,30),date(2015,11,1),date(2015,11,2),date(2015,11,4),date(2015,12,10)]})
df['game'] = 1  # initialize a game to count.
df['nb_games'] = df.groupby('Team')['game'].apply(pd.rolling_count, 5)

如果你要

Date

你得到了令人惊讶的结果(一个 Date Team game nb_games 0 2015-10-27 CHI 1 1 2 2015-10-29 CHI 1 2 3 2015-10-30 CHI 1 3 5 2015-11-02 CHI 1 4 6 2015-11-04 CHI 1 5 1 2015-10-28 IND 1 1 4 2015-11-01 IND 1 2 7 2015-12-10 IND 1 3 改为一个月后)

nb_games=3
datetime的{​​p>为12月的较晚日期,过去五天内没有游戏。除非您转换为DataFrame,否则您只计算Select * From Table t cross apply (select ltrim(rtrim(item)) as keyword from dbo.split(t.keywords, ',') ) tk cross apply (select ltrim(rtrim(@item)) as keyword from dbo.split(@Keywords) ) input where input.keyword = tk.keyword; 中的最后五个条目,因此对于玩过五场以上游戏的团队,您总是会得到五个。