我有2个csv文件如下:
File1.csv:
Name, Email
Jon, jon@email.com
Roberto, roberto@email.com
Mona, mona@email.com
James, james@email.com
File2.csv:
Email
mona@email.com
james@email.com
我想要的是没有File2.csv的File1.csv,iex File3.csv(输出)应如下所示:
File3.csv:
Name, Email
Jon, jon@email.com
Roberto, roberto@email.com
在Python中编写代码的最简单方法是什么?
答案 0 :(得分:1)
dont_need_em = []
with open("file2.csv", 'r') as fn:
for line in fn:
if not line.startswith("Email"):
dont_need_em.append(line.rstrip())
fw = open("file3.csv", 'w')
with open("file1.csv", 'r') as fn:
for line in fn:
if line.rstrip().split(", ")[1] not in dont_need_em:
fw.write(line.rstrip())
fw.close()
这应该做到,但我相信有更简单的解决方案
编辑:创建第三个文件
答案 1 :(得分:1)
使用Pandas你可以这样做:
import pandas as pd
#Read two files into data frame using column names from first row
file1=pd.read_csv('File1.csv',header=0,skipinitialspace=True)
file2=pd.read_csv('File2.csv',header=0,skipinitialspace=True)
#Only return lines in file 1 if the email is not contained in file 2
cleaned=file1[~file1["Email"].isin(file2["Email"])]
#Output file to CSV with original headers
cleaned.to_csv("File3.csv", index=False)
答案 2 :(得分:0)
这是一个很好的方法(它与上面的非常类似,但是将余数写入文件而不是打印:
Removed = []
with open("file2.csv", 'r') as f2:
for line in f2:
if not line.startswith("Email"):
removed.append(line.rstrip())
with open("file1.csv", 'r') as f1:
with open("file3.csv", 'w') as f3:
for line in f1:
if line.rstrip().split(", ")[1] not in removed:
f3.write(line)
这是如何工作的: 第一个块读取要过滤到列表中的所有电子邮件。接下来,第二个块打开原始文件并设置一个新文件以写入剩下的内容。它会从您的第一个文件中读取每一行,并将其写入第三个文件,前提是您的列表中的电子邮件不能过滤
答案 3 :(得分:0)
如果您在UNIX下:
INSERT INTO Production ( Shift, Supervisor, [Date], [Work Center] )
SELECT Templates.Shift, Templates.Supervisor,
forms.[Enter Production].[production Date] AS Expr1,
Templates.[Work Center]
FROM Templates
WHERE (((Templates.Shift)=[Forms]![Enter Production]![Production Shift])
AND ((Templates.Supervisor)=[Forms]![Enter Production]![Supervisor]));
答案 4 :(得分:0)
以下内容应该可以满足您的需求。首先将File2.csv
读入要跳过的set
个电子邮件地址。然后逐行读取File1.csv
,只写入不在跳过列表中的行:
import csv
with open('File2.csv', 'r') as file2:
skip_list = set(line.strip() for line in file2.readlines()[1:])
with open('File1.csv', 'rb') as file1, open('File3.csv', 'wb') as file3:
csv_file1 = csv.reader(file1, skipinitialspace=True)
csv_file3 = csv.writer(file3)
csv_file3.writerow(next(csv_file1)) # Write the header line
for cols in csv_file1:
if cols[1] not in skip_list:
csv_file3.writerow(cols)
这会在File3.csv
中为您提供以下输出:
Name,Email
Jon,jon@email.com
Roberto,roberto@email.com