Question

在重命名数据框中的所有重复名称时遇到问题。

Column 1: x,y,z,....(all different names)
Column 2: a,b,c,.....(all different names)
Column 3: p,pq,r,s,p,s,r,pq,p.....

我需要第3列为p_1，pq_1，r_1，s_1，p_2，s_2，r_2，pq_2，p_3，.....

我在第3列中有很多重复项，但我想如上所述全部命名。

我尝试了下面的代码，并输出为：

Column 3: p,pq,r,s,p_1,s_1,r_1,pq_1,p_2,.....

def df_name_uniquify(RS):
    df_names = RS["Column 3"]
    new_names = []
    for item in df_names:
        counter = 0
        newitem = item
        while newitem in new_names:
            counter += 1
            newitem = "{}_{}".format(item, counter)
        new_names.append(newitem)
    RS["Column 3"] = new_names
    return RS
df = df_name_uniquify(RS)

任何建议或修改代码都会有所帮助

预先感谢

Answer 1

您可以在此处使用cumcount()：

df['new_col']=df.Column3+'_'+(df.groupby('Column3').cumcount()+1).astype(str)
print(df)

  Column3 new_col
0       p     p_1
1      pq    pq_1
2       r     r_1
3       s     s_1
4       p     p_2
5       s     s_2
6       r     r_2
7      pq    pq_2
8       p     p_3

如何重命名列中的所有名称重复在Dataframe中

1 个答案: