在python中比较两个文件时如何忽略标点符号

时间:2019-04-10 07:55:39

标签: python

我正在尝试扩展比较python脚本,以不包括具有这样的大小写项目(小写/大写名称和引号的使用)。目前,我有以下

compare.py

with open('old.csv', 'r') as t1, open('new.csv', 'r') as t2:
    fileone = t1.readlines()
    filetwo = t2.readlines()

with open('update.csv', 'w') as outFile:
    for line in filetwo:
        if line not in fileone:
            outFile.write(line)

可以正确比较两个文件,并在第三个文件中输出差异。但是说我有以下内容

old.csv

"testCaseA",
"testCaseB",
"testCaseC"

new.csv

testCaseA,
testCaseB,

update.csv应该是

testCaseC

3 个答案:

答案 0 :(得分:2)

使用split()strip()从列表中的元素中删除'\n',和双精度",然后通过列表理解来查找区别:

with open('old.txt', 'r') as t1, open('new.txt', 'r') as t2:
     fileone = [i.split(',\n', 1)[0].strip('"') for i in t1.readlines()]
     filetwo = [i.split(',\n', 1)[0].strip(',') for i in t2.readlines()]

# print(fileone)   # ['testCaseA', 'testCaseB', 'testCaseC']
# print(filetwo)   # ['testCaseA', 'testCaseB']

s = set(filetwo)
print([x for x in fileone if x not in s])

输出

['testCaseC']

答案 1 :(得分:0)

为此,最好使用difflib库。来自文档:

  

此模块提供用于比较序列的类和函数。例如,它可以用于比较文件,并可以产生各种格式的差异信息,包括HTML和上下文以及统一的差异

文档中的

This示例是一个很好的起点,this页面上有很多示例:)

答案 2 :(得分:0)

我认为以下代码更漂亮:

from pathlib import Path

fn1, fn2 = 'old.csv', 'new.csv'
ss1, ss2 = [Path(fn).read_text().splitlines() for fn in (fn1, fn2)]
for ss in (ss1, ss2):
    for i, v in enumerate(ss):
        ss[i] = v.strip('\'",')

set1, set2 = [set(ss) for ss in (ss1, ss2)]

for i, line in enumerate(ss1, 1):
    if line not in set2:
        print(f'line {i}: `{line}` : in {fn1}, but not in {fn2}')

for i, line in enumerate(ss2, 1):
    if line not in set1:
        print(f'line {i}: `{line}` : in {fn2}, but not in {fn1}')

最后两个for循环也可以是:

for ss, line_set, f1, f2 in ((ss1, set2, fn1, fn2), (ss2, set1, fn2, fn1)):
    for i, line in enumerate(ss, 1):
        if line not in line_set:
            print(f'line {i}: `{line}` : in {f1}, but not in {f2}')