我有一个看起来像这样的DataFrame:
col1 col2 col3
0 0 string(0) type B
1 1 string(1) type B
2 2 string(2) type B
3 3 string(3) type B
4 4 string(4) type A
5 5 string(5) type A
6 6 string(6) type A
7 7 string(7) type A
8 8 string(8) type A
9 9 string(9) type A
10 10 string(10) type A
11 11 string(11) type A
12 12 string(12) type B
13 13 string(13) type B
14 14 string(14) type A
15 15 string(15) type A
16 16 string(16) type A
17 17 string(17) type A
18 18 string(18) type A
19 19 string(19) type A
20 20 string(20) type A
21 21 string(21) type B
22 22 string(22) type B
23 23 string(23) type B
24 24 string(24) type A
25 25 string(25) type A
26 26 string(26) type A
27 27 string(27) type A
28 28 string(28) type A
29 29 string(29) type A
我正在寻找在col3中提取一种特定类型并以这种方式枚举它们的最有效方法:
col1 col2 col3 col4
0 0 string(0) type B 0
1 1 string(1) type B 0
2 2 string(2) type B 0
3 3 string(3) type B 0
12 12 string(12) type B 1
13 13 string(13) type B 1
21 21 string(21) type B 2
22 22 string(22) type B 2
23 23 string(23) type B 2
枚举基于类型簇。例如,col4中的0表示B类型的第0个簇。在此先感谢大家的帮助
编辑:生成上述DataFrame的代码如下:
import pandas as pd
n = 30
df = pd.DataFrame({'col1': [i for i in range(n)],
'col2': [f'string({i})' for i in range(n)]}
)
df['col3'] = 'type A'
df['col3'].iloc[[0,1,2,3,12,13,21,22,23]] = 'type B'#create col3
答案 0 :(得分:2)
首先将具有Series.ne
个值的Series.shift
与不等于进行比较,然后过滤并为组添加Series.cumsum
:
df['col4'] = df['col3'].ne(df['col3'].shift())
df = df[df['col3'] == 'type B']
df['col4'] = df['col4'].cumsum() - 1
print (df)
col1 col2 col3 col4
0 0 string(0) type B 0
1 1 string(1) type B 0
2 2 string(2) type B 0
3 3 string(3) type B 0
12 12 string(12) type B 1
13 13 string(13) type B 1
21 21 string(21) type B 2
22 22 string(22) type B 2
23 23 string(23) type B 2