我正在处理一个类似这样的时序数据帧,除了有数千行。我想创建一个新列,该列枚举具有相同“ sign”值的行块。即第0行将为0,第1行至第23行将为1,第24行至第30行将为2等...(按时间顺序排列很重要)完成此操作的最有效方法是什么?预先谢谢
Date sign
0 2011-01-27 1
1 2011-01-28 -1
2 2011-01-31 -1
3 2011-02-01 -1
4 2011-02-02 -1
5 2011-02-07 -1
6 2011-02-08 -1
7 2011-02-09 -1
8 2011-02-10 -1
9 2011-02-11 -1
10 2011-02-14 -1
11 2011-02-15 -1
12 2011-02-16 -1
13 2011-02-17 -1
14 2011-02-18 -1
15 2011-02-21 -1
16 2011-02-22 -1
17 2011-02-23 -1
18 2011-02-24 -1
19 2011-02-25 -1
20 2011-02-28 -1
21 2011-03-01 -1
22 2011-03-02 -1
23 2011-03-03 -1
24 2011-03-04 1
25 2011-03-07 1
26 2011-03-08 1
27 2011-03-09 1
28 2011-03-10 1
29 2011-03-11 1
30 2011-03-14 1
31 2011-03-15 -1
32 2011-03-16 -1
33 2011-03-17 -1
34 2011-03-18 -1
35 2011-03-21 -1
36 2011-03-22 -1
37 2011-03-23 -1
38 2011-03-24 -1
39 2011-03-25 -1
40 2011-03-28 -1
41 2011-03-29 1
42 2011-03-30 1
答案 0 :(得分:2)
您可以使用cumsum
获取diff
符号变化的位置:
df['new_column'] = (df.sign.diff()!=0).cumsum()-1
>>> df
Date sign new_column
0 2011-01-27 1 0
1 2011-01-28 -1 1
2 2011-01-31 -1 1
3 2011-02-01 -1 1
4 2011-02-02 -1 1
5 2011-02-07 -1 1
6 2011-02-08 -1 1
7 2011-02-09 -1 1
8 2011-02-10 -1 1
9 2011-02-11 -1 1
10 2011-02-14 -1 1
11 2011-02-15 -1 1
12 2011-02-16 -1 1
13 2011-02-17 -1 1
14 2011-02-18 -1 1
15 2011-02-21 -1 1
16 2011-02-22 -1 1
17 2011-02-23 -1 1
18 2011-02-24 -1 1
19 2011-02-25 -1 1
20 2011-02-28 -1 1
21 2011-03-01 -1 1
22 2011-03-02 -1 1
23 2011-03-03 -1 1
24 2011-03-04 1 2
25 2011-03-07 1 2
26 2011-03-08 1 2
27 2011-03-09 1 2
28 2011-03-10 1 2
29 2011-03-11 1 2
30 2011-03-14 1 2
31 2011-03-15 -1 3
32 2011-03-16 -1 3
33 2011-03-17 -1 3
34 2011-03-18 -1 3
35 2011-03-21 -1 3
36 2011-03-22 -1 3
37 2011-03-23 -1 3
38 2011-03-24 -1 3
39 2011-03-25 -1 3
40 2011-03-28 -1 3
41 2011-03-29 1 4
42 2011-03-30 1 4
答案 1 :(得分:1)
您可以这样做:
Task