使用两个CSV文件中的匹配列值来创建包含组合数据的新文件

时间:2016-12-27 19:45:29

标签: python python-3.x csv

我目前正在尝试比较两个CSV文件,以检查file1.csv的第一列中的IP地址是否使用Python 3.6在file2.csv中的一行中。如果地址在file2中,我需要将该行的第二列值复制到与文件1相同的新文件中。两个文件设置如下所示:

文件1:

XX.XXX.XXX.1,Test1
XX.XXX.XXX.2,Test2
XX.XXX.XXX.3,Test3
XX.XXX.XXX.4,Test4
XX.XXX.XXX.5,Test5
XX.XXX.XXX.6,Test6
XX.XXX.XXX.7,Test7
XX.XXX.XXX.8,Test8

and so on

文件2:

XX.XXX.XXX.6, Name6
XX.XXX.XXX.7, Name7
XX.XXX.XXX.8, Name8

我需要将result.csv文件看起来像这样:

XX.XXX.XXX.1,Test1, Not found
XX.XXX.XXX.2,Test2, Not found
XX.XXX.XXX.3,Test3, Not found
XX.XXX.XXX.4,Test4, Not found
XX.XXX.XXX.5,Test5, Not found
XX.XXX.XXX.6,Test6,Name6
XX.XXX.XXX.7,Test7,Name7
XX.XXX.XXX.8,Test8,Name8

我到目前为止的代码如下:

import csv

f1 = open('file1.csv', 'r')
f2 = open('file2.csv', 'r')
f3 = open('results.csv', 'w')

c1 = csv.reader(f1)
c2 = csv.reader(f2)
c3 = csv.writer(f3)

file2 = list(c2)

for file1_row in c1:
    row = 1
    found = False
    for file2_row in file2:
        results_row = file1_row
        x = file2_row[3]
        if file1_row[1] == file2_row[1]:

        results_row.append('Found. Name: ' + x)
        found = True
        break
    row += 1
if not found:
    results_row.append('Not found in File1')
c3.writerow(results_row)

f1.close()
f2.close()
f3.close()

现在这段代码正在检查相同的行而不是值。这意味着它不匹配任何东西,因为它要求IP列和相邻列在两个文件上都相同,此外它匹配文件的第1行,第2行,第3行等等,但我需要它来搜索一个在另一个中找到匹配项,而不是按索引比较行。

4 个答案:

答案 0 :(得分:1)

熊猫解决方案:

import pandas as pd

df1 = pd.read_csv('file_1.csv', names=['a', 'b'])
df2 = pd.read_csv('file_2.csv', names=['a', 'b'])
merged = pd.merge(df1, df2, on='a', how='outer')
merged.to_csv('results.csv', header=False, index=False, na_rep='Not found')

results.csv的内容:

XX.XXX.XXX.1,Test1,Not found
XX.XXX.XXX.2,Test2,Not found
XX.XXX.XXX.3,Test3,Not found
XX.XXX.XXX.4,Test4,Not found
XX.XXX.XXX.5,Test5,Not found
XX.XXX.XXX.6,Test6, Name6
XX.XXX.XXX.7,Test7, Name7
XX.XXX.XXX.8,Test8, Name8

答案 1 :(得分:0)

我移动了results_row的位置并在行+ = 1

之后更改了缩进
import csv

f1 = open('file1.csv', 'r')
f2 = open('file2.csv', 'r')
f3 = open('results.csv', 'w')

c1 = csv.reader(f1)
c2 = csv.reader(f2)
c3 = csv.writer(f3)

file2 = list(c2)

for file1_row in c1:
    row = 1
    found = False
    results_row = file1_row  #Moved out from nested loop
    for file2_row in file2:        
        x = file2_row[1]
        if file1_row[0] == file2_row[0]:
            results_row.append(x)
            found = True
            break
    row += 1
    if not found:
        results_row.append('Not found')     
    c3.writerow(results_row)

f1.close()
f2.close()
f3.close()

答案 2 :(得分:0)

您尝试过的解决方案如下:

with open('result.csv', 'w') as out:
    with open('file1.csv', 'r') as f1, open('file2.csv', 'r') as f2:
        f2_lines = [line for line in f2.readlines() if len(line) > 1]
        f1_lines = [line for line in f1.readlines() if len(line) > 1]
        for line in f1_lines:
            val = 'Not found'
            b = [line.split(',')[0].strip() in item for item in f2_lines]
            if any(b):
                val = f2_lines[b.index(True)].split(',')[1].strip()
            out.write('{}, {}\n'.format(line.strip(), val))

<强>输出:

XX.XXX.XXX.1,Test1, Not found
XX.XXX.XXX.2,Test2, Not found
XX.XXX.XXX.3,Test3, Not found
XX.XXX.XXX.4,Test4, Not found
XX.XXX.XXX.5,Test5, Not found
XX.XXX.XXX.6,Test6, Name6
XX.XXX.XXX.7,Test7, Name7
XX.XXX.XXX.8,Test8, Name8

答案 3 :(得分:0)

这是一个非熊猫的解决方案(假设您使用的是Python 3.x):

import csv

present = {}
with open('file2.csv', 'r', newline='') as file2:
    reader = csv.reader(file2, skipinitialspace=True)
    for ip, name in reader:
        present[ip] = name

with open('file1.csv', 'r', newline='') as file1, \
     open('results.csv', 'w', newline='') as results:
    reader = csv.reader(file1, skipinitialspace=True)
    writer = csv.writer(results)
    for ip, name in reader:
        writer.writerow([ip, name, present.get(ip, ' Not found')])

档案Results.csv

XX.XXX.XXX.1,Test1, Not found
XX.XXX.XXX.2,Test2, Not found
XX.XXX.XXX.3,Test3, Not found
XX.XXX.XXX.4,Test4, Not found
XX.XXX.XXX.5,Test5, Not found
XX.XXX.XXX.6,Test6,Name6
XX.XXX.XXX.7,Test7,Name7
XX.XXX.XXX.8,Test8,Name8