从CSV中删除重复的行

时间:2016-08-04 18:01:47

标签: python csv

我有一个看起来像这样的CSV文件

red,75,right
green,3,center
yellow,3222,right
blue,9,center
black,123,left
white,68,right
purple,988,right
pink,2677,left

我正在使用Python并且正在尝试删除在单元格1中有重复的行。我知道我可以使用像pandas这样的东西来实现这一点,但我正在尝试使用标准的python CSV库。

预期结果是......

{{1}}

有人有例子吗?

2 个答案:

答案 0 :(得分:1)

您可以简单地使用颜色为键的字典,值为行。如果颜色已经在字典中,则忽略该颜色,否则添加它并将该行写入新的csv文件。

import csv

file_in = 'input_file.csv'
file_out = 'output_file.csv'
with open(file_in, 'rb') as fin, open(file_out, 'wb') as fout:
    reader = csv.reader(fin)
    writer = csv.writer(fout)
    d = {}
    for row in reader:
        color = row[0]
        if color not in d:
            d[color] = row  
            writer.writerow(row)
result = d.values()

result
# Output:
# [['blue', '9', 'center'],
# ['pink', '2677', 'left'],
# ['purple', '48', 'left'],
# ['yellow', '3222', 'right'],
# ['black', '123', 'left'],
# ['green', '3', 'center'],
# ['white', '68', 'right'],
# ['red', '75', 'right']]

csv文件的输出:

!cat output_file.csv
# Output:
# red,75,right
# green,3,center
# yellow,3222,right
# blue,9,center
# black,123,left
# white,68,right
# purple,48,left
# pink,2677,left

答案 1 :(得分:0)

你可以试试这个:

import fileinput

def main():
    seen = set() # set for fast O(1) amortized lookup

    for line in fileinput.FileInput('1.csv', inplace=1):
        cell_1 = line.split(',')[0]
        if cell_1 not in seen: 
            seen.add(cell_1)
            print line, # standard output is now redirected to the file

if __name__ == '__main__':
    main()