我有这个数据框:
df = pd.DataFrame({'a' : ('road','road','road','highway','house','house'),
'b' : ('11','23','15','32','17','21')})
给予:
df
a b
0 road 11
1 road 23
2 road 15
3 highway 32
4 house 17
5 house 21
我想创建一个新字段,如果根据a
重复,则新文件将为1,否则为0。
在这里我过滤重复的值:
mask = df['a'].duplicated(keep = False)
df[mask]
给予:
a b
0 road 11
1 road 23
2 road 15
4 house 17
5 house 21
想要的结果:
a b c
0 road 11 1
1 road 23 1
2 road 15 1
3 highway 32 0
4 house 17 1
5 house 21 1
答案 0 :(得分:0)
您可以将df['a'].duplicated(keep = False)
的结果分配给新列,例如:
df['c'] = df['a'].duplicated(keep = False)
结果,我们获得:
>>> df
a b c
0 road 11 True
1 road 23 True
2 road 15 True
3 highway 32 False
4 house 17 True
5 house 21 True
或者如果您想要整数:
df['c'] = df['a'].duplicated(keep = False).astype(int)
产生预期结果:
>>> df
a b c
0 road 11 1
1 road 23 1
2 road 15 1
3 highway 32 0
4 house 17 1
5 house 21 1