我正在尝试扩展比较python脚本,以不包括具有这样的大小写项目(小写/大写名称和引号的使用)。目前,我有以下
compare.py
with open('old.csv', 'r') as t1, open('new.csv', 'r') as t2:
fileone = t1.readlines()
filetwo = t2.readlines()
with open('update.csv', 'w') as outFile:
for line in filetwo:
if line not in fileone:
outFile.write(line)
可以正确比较两个文件,并在第三个文件中输出差异。但是说我有以下内容
old.csv
"testCaseA",
"testCaseB",
"testCaseC"
new.csv
testCaseA,
testCaseB,
update.csv应该是
testCaseC
答案 0 :(得分:2)
使用split()
和strip()
从列表中的元素中删除'\n'
,,
和双精度"
,然后通过列表理解来查找区别:
with open('old.txt', 'r') as t1, open('new.txt', 'r') as t2:
fileone = [i.split(',\n', 1)[0].strip('"') for i in t1.readlines()]
filetwo = [i.split(',\n', 1)[0].strip(',') for i in t2.readlines()]
# print(fileone) # ['testCaseA', 'testCaseB', 'testCaseC']
# print(filetwo) # ['testCaseA', 'testCaseB']
s = set(filetwo)
print([x for x in fileone if x not in s])
输出:
['testCaseC']
答案 1 :(得分:0)
为此,最好使用difflib库。来自文档:
文档中的此模块提供用于比较序列的类和函数。例如,它可以用于比较文件,并可以产生各种格式的差异信息,包括HTML和上下文以及统一的差异
答案 2 :(得分:0)
我认为以下代码更漂亮:
from pathlib import Path
fn1, fn2 = 'old.csv', 'new.csv'
ss1, ss2 = [Path(fn).read_text().splitlines() for fn in (fn1, fn2)]
for ss in (ss1, ss2):
for i, v in enumerate(ss):
ss[i] = v.strip('\'",')
set1, set2 = [set(ss) for ss in (ss1, ss2)]
for i, line in enumerate(ss1, 1):
if line not in set2:
print(f'line {i}: `{line}` : in {fn1}, but not in {fn2}')
for i, line in enumerate(ss2, 1):
if line not in set1:
print(f'line {i}: `{line}` : in {fn2}, but not in {fn1}')
最后两个for循环也可以是:
for ss, line_set, f1, f2 in ((ss1, set2, fn1, fn2), (ss2, set1, fn2, fn1)):
for i, line in enumerate(ss, 1):
if line not in line_set:
print(f'line {i}: `{line}` : in {f1}, but not in {f2}')