新建一个字段,将1表示重复项,将0表示非重复项

时间:2018-10-29 08:47:18

标签: python pandas

我有这个数据框:

df = pd.DataFrame({'a' : ('road','road','road','highway','house','house'), 
                                   'b' : ('11','23','15','32','17','21')})

给予:

df

       a    b
0   road    11
1   road    23
2   road    15
3   highway 32
4   house   17
5   house   21

我想创建一个新字段,如果根据a重复,则新文件将为1,否则为0。

在这里我过滤重复的值:

mask = df['a'].duplicated(keep = False)
df[mask]

给予:

       a    b
0   road    11
1   road    23
2   road    15
4   house   17
5   house   21

想要的结果:

       a    b    c
0   road    11   1
1   road    23   1
2   road    15   1
3   highway 32   0
4   house   17   1
5   house   21   1

1 个答案:

答案 0 :(得分:0)

您可以将df['a'].duplicated(keep = False)的结果分配给新列,例如:

df['c'] = df['a'].duplicated(keep = False)

结果,我们获得:

>>> df
         a   b      c
0     road  11   True
1     road  23   True
2     road  15   True
3  highway  32  False
4    house  17   True
5    house  21   True 

或者如果您想要整数:

df['c'] = df['a'].duplicated(keep = False).astype(int)

产生预期结果:

>>> df
         a   b  c
0     road  11  1
1     road  23  1
2     road  15  1
3  highway  32  0
4    house  17  1
5    house  21  1