有一个数据框,其中包括一列记录编号(升序)和一列工作日。计划是提取每天的第一个和最后一个记录号。例如:
df = pd.DataFrame({'records': [1, 2, 3, 4, 6, 7, 8, 12, 14, 15, 16, 19, 23, 26, 29, 38, 43, 59, 61],
'weekday': ['Monday', 'Monday', 'Monday', 'Tuesday', 'Tuesday', 'Wednesday', 'Thursday',
'Thursday', 'Thursday', 'Friday', 'Friday', 'Friday', 'Saturday', 'Sunday',
'Monday', 'Monday', 'Tuesday', 'Wednesday', 'Wednesday']})
>>> df
records weekday
0 1 Monday
1 2 Monday
2 3 Monday
3 4 Tuesday
4 6 Tuesday
5 7 Wednesday
6 8 Thursday
7 12 Thursday
8 14 Thursday
9 15 Friday
10 16 Friday
11 19 Friday
12 23 Saturday
13 26 Sunday
14 29 Monday
15 38 Monday
16 43 Tuesday
17 59 Wednesday
18 61 Wednesday
我想尝试这样的事情:
first last records weekday
0 1 3 1 Monday
1 1 3 2 Monday
2 1 3 3 Monday
3 4 6 4 Tuesday
4 4 6 6 Tuesday
5 7 7 7 Wednesday
6 8 14 8 Thursday
7 8 14 12 Thursday
8 8 14 14 Thursday
9 15 19 15 Friday
10 15 19 16 Friday
11 15 19 19 Friday
12 23 23 23 Saturday
13 26 26 26 Sunday
14 29 38 29 Monday
15 29 38 38 Monday
16 43 43 43 Tuesday
17 59 61 59 Wednesday
18 59 61 61 Wednesday
那我从哪里开始呢?在监控任何变化的同时,从上到下迭代工作日列是否正确?
答案 0 :(得分:1)
df['first'] = (df
.groupby((df.weekday != df.weekday.shift()).cumsum())
.records
.transform('first'))
df['last'] = (df
.groupby((df.weekday != df.weekday.shift()).cumsum())
.records
.transform('last'))
>>> df
records weekday first last
0 1 Monday 1 3
1 2 Monday 1 3
2 3 Monday 1 3
3 4 Tuesday 4 6
4 6 Tuesday 4 6
5 7 Wednesday 7 7
6 8 Thursday 8 14
7 12 Thursday 8 14
8 14 Thursday 8 14
9 15 Friday 15 19
10 16 Friday 15 19
11 19 Friday 15 19
12 23 Saturday 23 23
13 26 Sunday 26 26
14 29 Monday 29 38
15 38 Monday 29 38
16 43 Tuesday 43 43
17 59 Wednesday 59 61
18 61 Wednesday 59 61
诀窍是为每个工作日获取唯一索引(不仅仅是1-7,而是每次有新的工作日时增加1)。
df['week_counter'] = (df.weekday != df.weekday.shift()).cumsum()
>>> df
records weekday week_counter
0 1 Monday 1
1 2 Monday 1
2 3 Monday 1
3 4 Tuesday 2
4 6 Tuesday 2
5 7 Wednesday 3
6 8 Thursday 4
7 12 Thursday 4
8 14 Thursday 4
...
16 43 Tuesday 9
17 59 Wednesday 10
18 61 Wednesday 10
然后在week_counter
中使用这些groupby
值来创建记录组,并使用transorm
(以保持与原始数据帧相同的形状)同时获取第一个和最后一个每组records
。