我有两个CSV文件(A和B)。如果它与CSV-B中的一行共享相同的值,则我需要能够删除CSV-A中的任何行。我只需要比较第一列项目(电子邮件),然后删除该行(如果它存在于CSV-B中)。有一个简单的方法吗?
我在Powershell中做过类似的事情,但想在Python中做
$fileA = Import-csv '.\CSV-A.csv'
$fileB = Import-csv '.\CSV-B.csv'
$deduped = Compare-Object -Ref $fileA -Diff $fileB -Property email -PassThru |
Where-Object Sideindicator -eq '<=' |
Select-Object * -ExcludeProperty Sideindicator
$deduped
$deduped | Export-csv '.\deduped-output-file.csv' -NoTypeInformation
答案 0 :(得分:0)
内容为 a.csv
:
email1@gmail.com,val1,val2
email2@gmail.com,val1,val2
email3@gmail.com,val1,val2
和 b.csv
:
different1@gmail.com,val1,val2
email2@gmail.com,val1,val2
different2@gmail.com,val1,val2
脚本:
import csv
with open('a.csv', 'r', newline='') as f_a, \
open('b.csv', 'r', newline='') as f_b, \
open('export.csv', 'w', newline='') as f_export:
csvreader_a = csv.reader(f_a, delimiter=',', quotechar='"')
csvreader_b = csv.reader(f_b, delimiter=',', quotechar='"')
emails_to_remove = set(email for email, *_ in csvreader_b)
csvwriter = csv.writer(f_export, delimiter=',',
quotechar='"', quoting=csv.QUOTE_MINIMAL)
for email, *rest in csvreader_a:
if email in emails_to_remove:
continue
csvwriter.writerow([email] + rest)
产生 export.csv
:
email1@gmail.com,val1,val2
email3@gmail.com,val1,val2