我使用以下python脚本从CSV文件中删除重复项
with open('test.csv','r') as in_file, open('final.csv','w') as out_file:
seen = set() # set for fast O(1) amortized lookup
for line in in_file:
if line in seen: continue # skip duplicate
seen.add(line)
out_file.write(line)
我正在尝试修改它,以便不输出没有重复的列表到final.csv,而是输出找到的唯一值。
与现在的情况相反。有人有例子吗?
答案 0 :(得分:2)
使用dict跟踪每行发生的次数,然后您可以处理dict并仅将唯一项添加到seen
集,并将其写入final.csv
:< / p>
from collections import defaultdict
uniques = defaultdict(int)
with open('test.csv','r') as in_file, open('final.csv','w') as out_file:
seen = set() # set for fast O(1) amortized lookup
for line in in_file:
uniques[line] +=1
for k, v in uniques.iteritems():
if v = 1:
seen.add(k)
out_file.write(k)
或者:
from collections import defaultdict
uniques = defaultdict(int)
with open('test.csv','r') as in_file, open('final.csv','w') as out_file:
seen = set() # set for fast O(1) amortized lookup
for line in in_file:
uniques[line] +=1
seen = set(k for k in uniques if uniques[k] == 1)
for itm in seen:
out_file.write(itm)
或者,使用Counter
:
from collections import Counter
with open('test.csv','r') as in_file, open('final.csv','w') as out_file:
seen = set() # set for fast O(1) amortized lookup
lines = Counter(file.readlines())
seen = set(k for k in lines if lines[k] == 1)
for itm in seen:
out_file.write(itm)
这将输出仅出现一次的行,具体取决于您的意思&#34; uniques&#34;,这可能是也可能不正确。相反,如果要输出所有行,但每行只输出一个实例,请使用最后一个方法:
with open('test.csv','r') as in_file, open('final.csv','w') as out_file:
lines = Counter(file.readlines())
for itm in lines:
out_file.write(itm)
答案 1 :(得分:0)
您可以将dublicates收集到另一个变量,并使用它们从集合中删除不唯一的值。