我有一个数据框df
Name Reagent
0 Experiment1 water
1 Experiment1 oil
2 Experiment1 water
3 Experiment1 milk
4 Experiment1 water
5 Experiment1 tea
6 Experiment1 water
7 Experiment1 coffee
8 Experiment2 water
9 Experiment2 coffee
我想用某种区分符替换同一实验中的重复名称。在该示例中,给定实验中仅重复了水。
例如
Name Reagent
0 Experiment1 water1
1 Experiment1 oil
2 Experiment1 water2
3 Experiment1 milk
4 Experiment1 water3
5 Experiment1 tea
6 Experiment1 water4
7 Experiment1 coffee
8 Experiment2 water
9 Experiment2 coffee
感谢您的帮助
答案 0 :(得分:2)
解决方案:将所有值附加GroupBy.cumcount
作为计数器(并将0
值替换为空字符串以忽略每个第一次重复):
df['Reagent'] += df.groupby(['Name','Reagent']).cumcount().astype(str).replace('0','')
print (df)
Name Reagent
0 Experiment1 water
1 Experiment1 oil
2 Experiment1 water1
3 Experiment1 milk
4 Experiment1 water2
5 Experiment1 tea
6 Experiment1 water3
7 Experiment1 coffee
8 Experiment2 water
9 Experiment2 coffee
如果需要,只用两列替换所有重复对象,用两列用DataFrame.duplicated
过滤行,并添加1
:
mask = df.duplicated(['Name','Reagent'], keep=False)
df.loc[mask, 'Reagent'] += df[mask].groupby(['Name','Reagent']).cumcount().add(1).astype(str)
print (df)
Name Reagent
0 Experiment1 water1
1 Experiment1 oil
2 Experiment1 water2
3 Experiment1 milk
4 Experiment1 water3
5 Experiment1 tea
6 Experiment1 water4
7 Experiment1 coffee
8 Experiment2 water
9 Experiment2 coffee