我有CSV数据,其中特定的列具有重复的条目,例如 像a,b,c,a,b,c,v,f,c ...我想将值替换为 a,b,c,a_1,b_1,c_1,v,f,c_2 ... 我编写了以下代码以查找重复的代码:-
import csv
from collections import Counter
import pandas as pd
duplicate_names=[]
file='2018_Akola_August.csv'
with open(file, 'r', newline='') as csv_file:
occurrences = Counter()
for line in csv.reader(csv_file):
email = line[3]
if email in occurrences:
print(email)
duplicate_names.append(email)
occurrences[email] += 1
else:
occurrences[email] = 1
也要替换CSV列中的字符串,我写了如下代码,但是 对于两个重复值无法正常工作。
df = pd.read_csv(file, index_col=False, header=0)
#Finds 'a' and replaces it with 'a_1'
df.loc[df['Circle'] == 'a' , 'Circle']= 'a_1'
print(df)
df.to_csv(file)
此陈述将产生什么影响尚不清楚?
df.loc[df['Circle'] == 'a' , 'Circle'][]= 'a_1'
如何依次重命名此类重复项?
答案 0 :(得分:0)
这是分两个步骤的方法:
>>> df
Circle
0 a
1 b
2 c
3 a
4 b
5 c
6 v
7 f
8 c
dups = (df.loc[df['Circle'].duplicated(),'Circle'] + '_' +
df.groupby('Circle').cumcount().astype(str))
df.loc[dups.notnull(),'Circle'] = dups
>>> df
Circle
0 a
1 b
2 c
3 a_1
4 b_1
5 c_1
6 v
7 f
8 c_2
在回答第二个问题时,一行:
df.loc[df['Circle'] == 'a' , 'Circle']= 'a_1'
将取Circle
等于a
的所有值并将其更改为a_1