Question

[使用Python3]我想比较两个csv文件的内容，如果内容相同则让脚本打印出来。换句话说，如果所有行都匹配，它应该让我知道，如果不匹配，它应该让我知道不匹配的行数。

此外，我希望以后可以灵活地更改代码以写入与其他文件不匹配的所有行。

此外，虽然这两个文件在技术上应该完全相同，但行的顺序可能不一样（第一行除外，其中包含标题）。

输入文件如下所示：

field1  field2  field3  field4  ...
string  float   float   string  ...
string  float   float   string  ...
string  float   float   string  ...
string  float   float   string  ...
string  float   float   string  ...
...     ...     ...     ...     ...

我目前运行的代码如下（下图），但老实说，我不确定这是否是最好的（最pythonic）方式。此外，我不确定try: while 1: ...代码在做什么。这段代码是我搜索论坛和python文档的结果。到目前为止，代码运行了很长时间。

由于我很新，我非常希望收到有关代码的任何反馈，并且还会请求就您的任何可能建议提供解释。

代码：

import csv
import difflib

'''
Checks the content of two csv files and returns a message.
If there is a mismatch, it will output the number of mismatches.
'''

def compare(f1, f2):

    file1 = open(f1).readlines()
    file2 = open(f2).readlines()

    diff = difflib.ndiff(file1, file2)

    count = 0

    try:
        while 1:
            count += 1
            next(diff)
    except:
        pass

    return 'Checked {} rows and found {} mismatches'.format(len(file1), count)

print (compare('outfile.csv', 'test2.csv'))

修改该文件可以包含重复项，因此存储在一个集合中将不起作用（因为它将删除所有重复项，对吧？）。

Answer 1

try-while块只是迭代diff，你应该使用for循环：

count = 0
for delta in diff:
    count += 1

或更加pythonic生成器表达式

count = sum(1 for delta in diff)

（原始代码在每次迭代之前递增count，因此给出一个更高的计数。我想知道这是否正确。）

Answer 2

回答有关while 1：

的问题

请阅读有关生成器和迭代器的更多信息。

Diff.ndiff（）是一个返回和迭代器的生成器。循环通过调用next（）迭代它。只要它找到diff（迭代器接下来移动），它就会递增计数（这会给你不同的行总数）

比较两个多列csv文件

2 个答案: