我有一个像这样重复的csv文件:
"col1", "col2","col3"
Integer, Integer, Varchar(50)
7, 8, 21554
24, 25, 36544
"col1", "col2","col3"
Integer, Integer, Varchar(50)
7, 8, 21554
24, 25, 36544
如何剥离重复的部分,包括后面的标题,数据类型行和数据行?
我只想要这个:
"col1", "col2","col3"
Integer, Integer, Varchar(50)
7, 8, 21554
24, 25, 36544
答案 0 :(得分:1)
我们甚至不需要使用csv
模块。我们会记住文件的第一行是什么,然后写行,直到我们再次看到它,此时我们将停止,截断文件。
with open('infile.csv', newline='') as infile, open('outfile.csv', 'w+', newline='')as outfile:
first = next(infile)
outfile.write(first)
for line in infile:
if line == first:
break
outfile.write(line)
答案 1 :(得分:0)
你可以使用csv
模块(假设Python 2.x)这样做:
import csv
seen = set()
with open('duplicates.csv', 'rb') as infile, open('cleaned.csv', 'wb') as outfile:
reader = csv.reader(infile, skipinitialspace=True)
writer = csv.writer(outfile)
for row in (tuple(row) for row in reader):
if row not in seen:
writer.writerow(row)
seen.add(row)
print('done')