pandas:仅从工作日的时间戳中检测第一个/最后一个记录号

时间:2015-12-07 00:39:26

标签: python pandas dataframe

有一个数据框,其中包括一列记录编号(升序)和一列工作日。计划是提取每天的第一个和最后一个记录号。例如:

df = pd.DataFrame({'records': [1, 2, 3, 4, 6, 7, 8, 12, 14, 15, 16, 19, 23, 26, 29, 38, 43, 59, 61],
                   'weekday': ['Monday', 'Monday', 'Monday', 'Tuesday', 'Tuesday', 'Wednesday', 'Thursday',
                               'Thursday', 'Thursday', 'Friday', 'Friday', 'Friday', 'Saturday', 'Sunday',
                               'Monday', 'Monday', 'Tuesday', 'Wednesday', 'Wednesday']})
>>> df

    records    weekday
0         1     Monday
1         2     Monday
2         3     Monday
3         4    Tuesday
4         6    Tuesday
5         7  Wednesday
6         8   Thursday
7        12   Thursday
8        14   Thursday
9        15     Friday
10       16     Friday
11       19     Friday
12       23   Saturday
13       26     Sunday
14       29     Monday
15       38     Monday
16       43    Tuesday
17       59  Wednesday
18       61  Wednesday

我想尝试这样的事情:

    first  last  records    weekday
0       1     3        1     Monday
1       1     3        2     Monday
2       1     3        3     Monday
3       4     6        4    Tuesday
4       4     6        6    Tuesday
5       7     7        7  Wednesday
6       8    14        8   Thursday
7       8    14       12   Thursday
8       8    14       14   Thursday
9      15    19       15     Friday
10     15    19       16     Friday
11     15    19       19     Friday
12     23    23       23   Saturday
13     26    26       26     Sunday
14     29    38       29     Monday
15     29    38       38     Monday
16     43    43       43    Tuesday
17     59    61       59  Wednesday
18     59    61       61  Wednesday

那我从哪里开始呢?在监控任何变化的同时,从上到下迭代工作日列是否正确?

1 个答案:

答案 0 :(得分:1)

使用compare-cumsum-groupby模式:

df['first'] = (df
               .groupby((df.weekday != df.weekday.shift()).cumsum())
               .records
               .transform('first'))

df['last'] = (df
              .groupby((df.weekday != df.weekday.shift()).cumsum())
              .records
              .transform('last'))    
>>> df
    records    weekday  first  last
0         1     Monday      1     3
1         2     Monday      1     3
2         3     Monday      1     3
3         4    Tuesday      4     6
4         6    Tuesday      4     6
5         7  Wednesday      7     7
6         8   Thursday      8    14
7        12   Thursday      8    14
8        14   Thursday      8    14
9        15     Friday     15    19
10       16     Friday     15    19
11       19     Friday     15    19
12       23   Saturday     23    23
13       26     Sunday     26    26
14       29     Monday     29    38
15       38     Monday     29    38
16       43    Tuesday     43    43
17       59  Wednesday     59    61
18       61  Wednesday     59    61

诀窍是为每个工作日获取唯一索引(不仅仅是1-7,而是每次有新的工作日时增加1)。

df['week_counter'] = (df.weekday != df.weekday.shift()).cumsum()
>>> df
    records    weekday  week_counter
0         1     Monday             1
1         2     Monday             1
2         3     Monday             1
3         4    Tuesday             2
4         6    Tuesday             2
5         7  Wednesday             3
6         8   Thursday             4
7        12   Thursday             4
8        14   Thursday             4
...
16       43    Tuesday             9
17       59  Wednesday            10
18       61  Wednesday            10

然后在week_counter中使用这些groupby值来创建记录组,并使用transorm(以保持与原始数据帧相同的形状)同时获取第一个和最后一个每组records