比较CSV并删除从头开始出现的所有重复项

时间:2019-07-18 17:35:14

标签: python

我有两个CSV文件(A和B)。如果它与CSV-B中的一行共享相同的值,则我需要能够删除CSV-A中的任何行。我只需要比较第一列项目(电子邮件),然后删除该行(如果它存在于CSV-B中)。有一个简单的方法吗?

我在Powershell中做过类似的事情,但想在Python中做

$fileA = Import-csv '.\CSV-A.csv'
$fileB = Import-csv '.\CSV-B.csv'

$deduped = Compare-Object -Ref $fileA -Diff $fileB -Property email -PassThru | 
  Where-Object Sideindicator -eq '<=' | 
    Select-Object * -ExcludeProperty Sideindicator

$deduped
$deduped | Export-csv '.\deduped-output-file.csv' -NoTypeInformation

1 个答案:

答案 0 :(得分:0)

内容为 a.csv

email1@gmail.com,val1,val2
email2@gmail.com,val1,val2
email3@gmail.com,val1,val2

b.csv

different1@gmail.com,val1,val2
email2@gmail.com,val1,val2
different2@gmail.com,val1,val2

脚本:

import csv

with open('a.csv', 'r', newline='') as f_a, \
    open('b.csv', 'r', newline='') as f_b, \
    open('export.csv', 'w', newline='') as f_export:

    csvreader_a = csv.reader(f_a, delimiter=',', quotechar='"')
    csvreader_b = csv.reader(f_b, delimiter=',', quotechar='"')

    emails_to_remove = set(email for email, *_ in csvreader_b)

    csvwriter = csv.writer(f_export, delimiter=',',
                            quotechar='"', quoting=csv.QUOTE_MINIMAL)

    for email, *rest in csvreader_a:
        if email in emails_to_remove:
            continue
        csvwriter.writerow([email] + rest)

产生 export.csv

email1@gmail.com,val1,val2
email3@gmail.com,val1,val2