我有两个列表,我使用以下函数来分配行号(类似于unix中的nl):
def nl(inFile):
numberedLines = []
for line in fileinput.input(inFile):
numberedLines.append(str(fileinput.lineno()) + ': ' + line)
numberWidth = int(log10(fileinput.lineno())) + 1
for i, line in enumerate(numberedLines):
num, rest = line.split(':',1)
fnum = str(num).rjust(numberWidth)
numberedLines[i] = ':'.join([fnum, rest])
return ''.join(numberedLines)
这会重新列出以下列表:1: 12 14
2: 20 49
3: 21 28
。对于我使用的infile
,行号非常重要。我的第二个列表的结构方式相同,但行号没有任何意义。我需要找到与第二个文件的列表差异,并从第一个文件返回行号。例如:如果第二个文件有:5: 12 14
48: 20 49
我只想返回3
,这是第一个列表中缺失值的行号。
这是我尝试过的:
oldtxt = 'master_list.txt' # Line numbers are significant
newFile = 'list2compare.txt' # Line numbers don't matter
s = set(nl(oldtxt))
diff = [x for x in (newFile) if x not in s]
print diff
返回:[12 14\n', '20 49\n', '21 28\n']
- 显然不是我需要的。有什么想法吗?
答案 0 :(得分:0)
以下内容如何:
f1 = """\
12 14
20 49
21 28
"""
f2 = """\
12 14
20 49
"""
def parse(lines):
"Take a list of lines, turn into a dict of line number => value pairs"
return dict((i + 1, v) for i, v in enumerate(l for l in lines if l))
def diff(a, b):
"""
Given two dicts from parse(), remove go through each linenno => value in a and
if the value is in b's values, discard it; finally, return the remaining
lineno => value pairs
"""
bvals = frozenset(b.values())
return dict((ak, av) for ak, av in a.items() if av not in bvals)
def fmt(d):
"Turn linno => value pairs into ' lineno: value' strings"
nw = len(str(max(d.keys())))
return ["{0:>{1}}: {2}".format(k, nw, v) for k, v in d.items()]
d1 = parse(f1.splitlines())
print d1
print
d2 = parse(f2.splitlines())
print d2
print
d = diff(d1, d2)
print d
print
print "\n".join(fmt(d))
这给了我输出:
{1: '12 14', 2: '20 49', 3: '21 28'}
{1: '12 14', 2: '20 49'}
{3: '21 28'}
3: 21 28
答案 1 :(得分:0)
我会对此嗤之以鼻;)听起来好像是在主文件的行号之后,该行的内容也在比较文件中。这就是你追求的吗?在那种情况下,我建议......
主文件内容......
1 2 3 4
test
6 7 8 9
compare
me
比较文件内容......
6 7 8 9
10 11 12 13
me
代码:
master_file = open('file path').read()
compare_file = open('file path').read()
lines_master = master_file.splitlines()
lines_compare = compare_file.splitlines()
same_lines = []
for i,line in enumerate(lines_master):
if line in lines_compare:
same_lines.append(i+1)
print same_lines
结果是[3,5]
答案 2 :(得分:0)
您可以将difflib用于ttis:
>>> f1 = """1 2 3 4
... test
... 6 7 8 9
... compare
... me
... """
>>>
>>> f2 = """6 7 8 9
... 10 11 12 13
... me
... """
>>>
>>> import difflib
>>> for line in difflib.ndiff(f1.splitlines(), f2.splitlines()):
... if line.startswith('-'):
... print "Second file is missing line: '%s'" % line
... if line.startswith('+'):
... print "Second file contains additional line: '%s'" % line
...
Second file is missing line: '- 1 2 3 4'
Second file is missing line: '- test'
Second file is missing line: '- compare'
Second file contains additional line: '+ 10 11 12 13'