替换Pandas数据框中的重复字符串

时间:2019-04-03 12:09:15

标签: pandas dataframe

我有一个数据框df

Name            Reagent
0   Experiment1 water
1   Experiment1 oil
2   Experiment1 water
3   Experiment1 milk
4   Experiment1 water
5   Experiment1 tea
6   Experiment1 water
7   Experiment1 coffee
8   Experiment2 water
9   Experiment2 coffee

我想用某种区分符替换同一实验中的重复名称。在该示例中,给定实验中仅重复了水。

例如

   Name         Reagent
0   Experiment1 water1
1   Experiment1 oil
2   Experiment1 water2
3   Experiment1 milk
4   Experiment1 water3
5   Experiment1 tea
6   Experiment1 water4
7   Experiment1 coffee
8   Experiment2 water
9   Experiment2 coffee

感谢您的帮助

1 个答案:

答案 0 :(得分:2)

解决方案:将所有值附加GroupBy.cumcount作为计数器(并将0值替换为空字符串以忽略每个第一次重复):

df['Reagent'] += df.groupby(['Name','Reagent']).cumcount().astype(str).replace('0','')
print (df)
          Name Reagent
0  Experiment1   water
1  Experiment1     oil
2  Experiment1  water1
3  Experiment1    milk
4  Experiment1  water2
5  Experiment1     tea
6  Experiment1  water3
7  Experiment1  coffee
8  Experiment2   water
9  Experiment2  coffee

如果需要,只用两列替换所有重复对象,用两列用DataFrame.duplicated过滤行,并添加1

mask = df.duplicated(['Name','Reagent'], keep=False)
df.loc[mask, 'Reagent'] += df[mask].groupby(['Name','Reagent']).cumcount().add(1).astype(str)
print (df)
          Name Reagent
0  Experiment1  water1
1  Experiment1     oil
2  Experiment1  water2
3  Experiment1    milk
4  Experiment1  water3
5  Experiment1     tea
6  Experiment1  water4
7  Experiment1  coffee
8  Experiment2   water
9  Experiment2  coffee