使用python合并2个csv文件

时间:2016-01-25 14:19:23

标签: python csv

我有2个csv文件如下:

File1.csv:

Name, Email
Jon, jon@email.com
Roberto, roberto@email.com
Mona, mona@email.com
James, james@email.com

File2.csv:

Email
mona@email.com
james@email.com

我想要的是没有File2.csv的File1.csv,iex File3.csv(输出)应如下所示:

File3.csv:

Name, Email
Jon, jon@email.com
Roberto, roberto@email.com

在Python中编写代码的最简单方法是什么?

5 个答案:

答案 0 :(得分:1)

dont_need_em = []
with open("file2.csv", 'r') as fn:
    for line in fn:
        if not line.startswith("Email"):
            dont_need_em.append(line.rstrip())

fw = open("file3.csv", 'w')

with open("file1.csv", 'r') as fn:
    for line in fn:
        if line.rstrip().split(", ")[1] not in dont_need_em: 
            fw.write(line.rstrip())
fw.close()

这应该做到,但我相信有更简单的解决方案

编辑:创建第三个文件

答案 1 :(得分:1)

使用Pandas你可以这样做:

import pandas as pd
#Read two files into data frame using column names from first row
file1=pd.read_csv('File1.csv',header=0,skipinitialspace=True)
file2=pd.read_csv('File2.csv',header=0,skipinitialspace=True)

#Only return lines in file 1 if the email is not contained in file 2
cleaned=file1[~file1["Email"].isin(file2["Email"])]

#Output file to CSV with original headers
cleaned.to_csv("File3.csv", index=False)

答案 2 :(得分:0)

这是一个很好的方法(它与上面的非常类似,但是将余数写入文件而不是打印:

Removed = []
with open("file2.csv", 'r') as f2:
    for line in f2:
        if not line.startswith("Email"):
           removed.append(line.rstrip())


with open("file1.csv", 'r') as f1:
    with open("file3.csv", 'w') as f3:
        for line in f1:
            if line.rstrip().split(", ")[1] not in removed:
                f3.write(line)

这是如何工作的: 第一个块读取要过滤到列表中的所有电子邮件。接下来,第二个块打开原始文件并设置一个新文件以写入剩下的内容。它会从您的第一个文件中读取每一行,并将其写入第三个文件,前提是您的列表中的电子邮件不能过滤

答案 3 :(得分:0)

如果您在UNIX下:

INSERT INTO Production ( Shift, Supervisor, [Date], [Work Center] ) 
SELECT Templates.Shift, Templates.Supervisor, 
       forms.[Enter Production].[production Date] AS Expr1, 
       Templates.[Work Center] 
FROM Templates 
WHERE (((Templates.Shift)=[Forms]![Enter Production]![Production Shift])
AND ((Templates.Supervisor)=[Forms]![Enter Production]![Supervisor]));

答案 4 :(得分:0)

以下内容应该可以满足您的需求。首先将File2.csv读入要跳过的set个电子邮件地址。然后逐行读取File1.csv,只写入不在跳过列表中的行:

import csv

with open('File2.csv', 'r') as file2:
    skip_list = set(line.strip() for line in file2.readlines()[1:])

with open('File1.csv', 'rb') as file1, open('File3.csv', 'wb') as file3:
    csv_file1 = csv.reader(file1, skipinitialspace=True)
    csv_file3 = csv.writer(file3)
    csv_file3.writerow(next(csv_file1))    # Write the header line

    for cols in csv_file1:
        if cols[1] not in skip_list:
            csv_file3.writerow(cols)

这会在File3.csv中为您提供以下输出:

Name,Email
Jon,jon@email.com
Roberto,roberto@email.com