枚举DataFrame中的分类数据集群

时间:2019-04-27 07:18:46

标签: python pandas group-by categories enumeration

我有一个看起来像这样的DataFrame:

    col1        col2    col3
0      0   string(0)  type B
1      1   string(1)  type B
2      2   string(2)  type B
3      3   string(3)  type B
4      4   string(4)  type A
5      5   string(5)  type A
6      6   string(6)  type A
7      7   string(7)  type A
8      8   string(8)  type A
9      9   string(9)  type A
10    10  string(10)  type A
11    11  string(11)  type A
12    12  string(12)  type B
13    13  string(13)  type B
14    14  string(14)  type A
15    15  string(15)  type A
16    16  string(16)  type A
17    17  string(17)  type A
18    18  string(18)  type A
19    19  string(19)  type A
20    20  string(20)  type A
21    21  string(21)  type B
22    22  string(22)  type B
23    23  string(23)  type B
24    24  string(24)  type A
25    25  string(25)  type A
26    26  string(26)  type A
27    27  string(27)  type A
28    28  string(28)  type A
29    29  string(29)  type A

我正在寻找在col3中提取一种特定类型并以这种方式枚举它们的最有效方法:

    col1        col2    col3  col4
0      0   string(0)  type B     0
1      1   string(1)  type B     0
2      2   string(2)  type B     0
3      3   string(3)  type B     0
12    12  string(12)  type B     1
13    13  string(13)  type B     1
21    21  string(21)  type B     2
22    22  string(22)  type B     2
23    23  string(23)  type B     2

枚举基于类型簇。例如,col4中的0表示B类型的第0个簇。在此先感谢大家的帮助

编辑:生成上述DataFrame的代码如下:

import pandas as pd
n = 30
df = pd.DataFrame({'col1': [i for i in range(n)],
                   'col2': [f'string({i})' for i in range(n)]}
                    )

df['col3'] = 'type A'
df['col3'].iloc[[0,1,2,3,12,13,21,22,23]] = 'type B'#create col3 

1 个答案:

答案 0 :(得分:2)

首先将具有Series.ne个值的Series.shift与不等于进行比较,然后过滤并为组添加Series.cumsum

df['col4'] = df['col3'].ne(df['col3'].shift())
df = df[df['col3'] == 'type B']
df['col4'] = df['col4'].cumsum() - 1
print (df)
    col1        col2    col3  col4
0      0   string(0)  type B     0
1      1   string(1)  type B     0
2      2   string(2)  type B     0
3      3   string(3)  type B     0
12    12  string(12)  type B     1
13    13  string(13)  type B     1
21    21  string(21)  type B     2
22    22  string(22)  type B     2
23    23  string(23)  type B     2