如何在Python中比较两个csv文件

时间:2016-02-28 18:50:10

标签: python csv

我有两个csv文件。一个叫做“Standard reg.csv”,另一个叫“Driver Details.csv”

在'Standard reg.csv'中,前两行是:

['Day', 'Month', 'Year', 'Reg Plate', 'Hour', 'Minute', 'Second', 'Speed over limit']
['1', '1', '2016', 'NU16REG', '1', '1', '1', '5816.1667859699355']

Driver Details.csv中的前两行是:

['FirstName', 'LastName', 'StreetAddress', 'City', 'Region', 'Country', 'PostCode', 'Registration']
['Violet', 'Kirby', '585-4073 Convallis Street', 'Balfour', 'Orkney', 'United Kingdom', 'OC1X 6QE', 'NU16REG']

我的代码是:

import csv
file_1 = csv.reader(open('Standard Reg.csv', 'r'), delimiter=',')
file_2 = csv.reader(open('Driver Details.csv', 'r'), delimiter=',')
for row in file_1:
    reg = row[3]
    avgspeed = row[7]
    for row in file_2:
        firstname = row[0]
        lastname = row[1]
        address = row[2]
        city = row[3]
        region = row[4]
        reg2 = row[7]
if reg  == reg2:
    print('Match found')
else:
    print('No match found')

这是一项正在进行的工作,但我似乎无法让代码进行比较,而不仅仅是最后一行。

在此行之后print(reg)reg2 = row[7]

它显示它已读完整列。我在print(reg2)

之后reg2 = row[7]进行了整个列的打印

但在if reg == reg2: 它只读取两列的最后一行并进行比较,我不知道如何解决这个问题。

提前谢谢。

3 个答案:

答案 0 :(得分:1)

测试条件if reg == reg2出现在两个循环之外(对于file_1和对于file_2)。这就是为什么只用每个文件的最后一行进行测试的原因。

另一个问题是您在row循环中使用相同的循环变量for

答案 1 :(得分:1)

我建议您首先使用注册号作为密钥,将Driver Details.csv中的所有详细信息加载到字典中。这样就可以轻松查找给定的条目,而无需再次从文件中读取所有行:

import csv

driver_details = {}

with open('Driver Details.csv') as f_driver_details:
    csv_driver_details = csv.reader(f_driver_details)
    header = next(csv_driver_details)       # skip the header

    for row in csv_driver_details:
        driver_details[row[7]] = row

with open('Standard Reg.csv') as f_standard_reg:
    csv_standard_reg = csv.reader(f_standard_reg)
    header = next(csv_standard_reg)     # skip the header

    for row in csv_standard_reg:
        try:
            driver = driver_details[row[3]]
            print('Match found - {} {}'.format(driver[0], driver[1]))
        except KeyError as e:
            print('No match found')

您拥有的代码将遍历file_2并将文件指针留在末尾(如果未找到匹配项)或匹配位置(可能在之前的下一个条目中缺少匹配项) )。对于你的工作方法,你必须从每个循环开始读取文件,这将非常慢。

要添加输出csv并显示完整地址,您可以执行以下操作:

import csv

speed = 74.3
fine = 35

driver_details = {}

with open('Driver Details.csv') as f_driver_details:
    csv_driver_details = csv.reader(f_driver_details)
    header = next(csv_driver_details)       # skip the header

    for row in csv_driver_details:
        driver_details[row[7]] = row

with open('Standard Reg.csv') as f_standard_reg, open('Output log.csv', 'w', newline='') as f_output:
    csv_standard_reg = csv.reader(f_standard_reg)
    header = next(csv_standard_reg)     # skip the header
    csv_output = csv.writer(f_output)

    for row in csv_standard_reg:
        try:
            driver = driver_details[row[3]]
            print('Match found - Fine {}, Speed {}\n{} {}\n{}'.format(fine, speed, driver[0], driver[1], '\n'.join(driver[2:7])))
            csv_output.writerow(driver[0:7] + [speed, fine])
        except KeyError as e:
            print('No match found')

这将打印以下内容:

Match found - Fine 35, Speed 74.3
Violet Kirby
585-4073 Convallis Street
Balfour
Orkney
United Kingdom
OC1X 6QE

并生成包含以下内容的输出文件:

Violet,Kirby,585-4073 Convallis Street,Balfour,Orkney,United Kingdom,OC1X 6QE,74.3,35

答案 2 :(得分:0)

尝试使用csv.DictReader来消除大部分代码:

import csv
Violations = defaultdict(list)

# Read in the violations, there are probably less violations than drivers (I hope!)
with open('Standard reg.csv') as violations:
    for v in csv.DictReader(violations):
        Violations[v['Reg Plate']] = v

with open('Driver Details.csv') as drivers:
    for d in csv.DictReader(drivers):
        fullname = "{driver.FirstName} {driver.LastName}".format(driver=d)
        if d['Registration'] in Violations:
            count = len(Violations[d['Registration']])
            print("{fullname} has {count} violations.".format(fullname=fullname, count=count))
        else:
            print("{fullname} is too fast to catch!".format(fullname=fullname))