Question

比方说，我有一个包含多列的数据框。一栏为一些家伙建立了一个识别码（ID），另一栏为他们确定了一些特征，比方说犯下的不当行为的程度。一个例子：

`df
Out[63]: 
    Crime  ID
0      13   1
1      13   1
2      12   1
3      12   1
4      13   3
5      13   3
6      13   3
7      63   3
8      63   3
9      63   3
10     63   3
11      3   3
12      7   6
13      7   6
14     13   6
15     13   6
16     45   6`

是否可以按犯罪种类对身份证进行分类？可能的输出为：

`df1
Out[64]: 
    Crime  ID
0      13   1
1      13   1
2      12   1.1
3      12   1.1
4      13   3
5      13   3
6      13   3
7      63   3.1
8      63   3.1
9      63   3.1
10     63   3.1
11      3   3.2
12      7   6
13      7   6
14     13   6.1
15     13   6.1
16     45   6.2`

预先感谢

Answer 1

我想不出以矢量化方式完成此操作的好方法，但通过循环进行操作相对容易。

首先，您需要将dict映射（Crime，ID）对映射到ID，例如，可以为第9行提供与第7行相同的ID。

接下来，您需要将dict映射ID映射到目前使用的最高子ID，以便例如为第16行提供与第12和14行不同的ID。

类似的东西（未经测试）：

def remap(df):
    pairmap = {}
    subidmap = {}
    for row in df.itertuples():
        if (row.Crime, row.ID) not in pairmap:
            if row.ID not in subidmap:
                subidmap[row.ID] = 0
                subid = str(row.ID)
            else:
                subidmap[row.ID] += 1
                subid = f'{row.ID}.{subidmap[row.ID]}'
            pairmap[row.Crime, row.ID] = subid
        yield pairmap[row.Crime, row.ID]    

df1.ID = list(remap(df1))

Answer 2

也许有更好的解决方案，但是现在我想嵌套groupby可以做到这一点。

v = df.groupby('ID', sort=False).apply(
        lambda x: x.groupby('Crime', sort=False).ngroup()).reset_index(drop=True)
df['ID'] = np.where(
        v.eq(0), df['ID'], df['ID'].astype(str) + '.' + v.astype(str))

df
    Crime   ID
0      13    1
1      13    1
2      12  1.1
3      12  1.1
4      13    3
5      13    3
6      13    3
7      63  3.1
8      63  3.1
9      63  3.1
10     63  3.1
11      3  3.2
12      7    6
13      7    6
14     13  6.1
15     13  6.1
16     45  6.2

Answer 3

将groupby与factorize一起使用

s=df.groupby(['ID'],as_index=False)['Crime'].apply(lambda x : ('.'+pd.Series(pd.factorize(x)[0]).astype(str)).replace('.0','')).reset_index(drop=True)
s
Out[121]: 
0       
1       
2     .1
3     .1
4       
5       
6       
7     .1
8     .1
9     .1
10    .1
11    .2
12      
13      
14    .1
15    .1
16    .2
Name: Crime, dtype: object

df.ID.astype(str)+s
Out[122]: 
0       1
1       1
2     1.1
3     1.1
4       3
5       3
6       3
7     3.1
8     3.1
9     3.1
10    3.1
11    3.2
12      6
13      6
14    6.1
15    6.1
16    6.2
dtype: object

将小节后缀值添加到pandas列值

3 个答案: