使用set difference来获取缺失值的行号

时间:2012-09-27 15:19:39

标签: python list set compare difflib

我有两个列表,我使用以下函数来分配行号(类似于unix中的nl):

def nl(inFile):
    numberedLines = []
    for line in fileinput.input(inFile):
        numberedLines.append(str(fileinput.lineno()) + ':  ' + line)
    numberWidth = int(log10(fileinput.lineno())) + 1
    for i, line in enumerate(numberedLines):
        num, rest = line.split(':',1)
        fnum = str(num).rjust(numberWidth)
        numberedLines[i] = ':'.join([fnum, rest])
    return ''.join(numberedLines)

这会重新列出以下列表:1: 12 14 2: 20 49 3: 21 28。对于我使用的infile,行号非常重要。我的第二个列表的结构方式相同,但行号没有任何意义。我需要找到与第二个文件的列表差异,并从第一个文件返回行号。例如:如果第二个文件有:5: 12 14 48: 20 49我只想返回3,这是第一个列表中缺失值的行号。

这是我尝试过的:

oldtxt = 'master_list.txt'  # Line numbers are significant
newFile = 'list2compare.txt' # Line numbers don't matter

s = set(nl(oldtxt))
diff = [x for x in (newFile) if x not in s]
print diff

返回:[12 14\n', '20 49\n', '21 28\n'] - 显然不是我需要的。有什么想法吗?

3 个答案:

答案 0 :(得分:0)

以下内容如何:

f1 = """\
12 14
20 49
21 28
"""

f2 = """\
12 14
20 49
"""

def parse(lines):
  "Take a list of lines, turn into a dict of line number => value pairs"
  return dict((i + 1, v) for i, v in enumerate(l for l in lines if l))

def diff(a, b):
  """
  Given two dicts from parse(), remove go through each linenno => value in a and
  if the value is in b's values, discard it; finally, return the remaining
  lineno => value pairs
  """
  bvals = frozenset(b.values())
  return dict((ak, av) for ak, av in a.items() if av not in bvals)

def fmt(d):
  "Turn linno => value pairs into '  lineno: value' strings"
  nw = len(str(max(d.keys())))
  return ["{0:>{1}}: {2}".format(k, nw, v) for k, v in d.items()]

d1 = parse(f1.splitlines())
print d1
print
d2 = parse(f2.splitlines())
print d2
print
d = diff(d1, d2)
print d
print
print "\n".join(fmt(d))

这给了我输出:

{1: '12 14', 2: '20 49', 3: '21 28'}

{1: '12 14', 2: '20 49'}

{3: '21 28'}

3: 21 28

答案 1 :(得分:0)

我会对此嗤之以鼻;)听起来好像是在主文件的行号之后,该行的内容也在比较文件中。这就是你追求的吗?在那种情况下,我建议......

主文件内容......

1 2 3 4
test
6 7 8 9
compare
me

比较文件内容......

6 7 8 9
10 11 12 13
me

代码:

master_file = open('file path').read()
compare_file = open('file path').read()

lines_master = master_file.splitlines()
lines_compare = compare_file.splitlines()
same_lines = []
for i,line in enumerate(lines_master):
    if line in lines_compare:
        same_lines.append(i+1)

print same_lines

结果是[3,5]

答案 2 :(得分:0)

您可以将difflib用于ttis:

>>> f1 = """1 2 3 4
... test
... 6 7 8 9
... compare
... me
... """
>>> 
>>> f2 = """6 7 8 9
... 10 11 12 13
... me
... """
>>>
>>> import difflib
>>> for line in difflib.ndiff(f1.splitlines(), f2.splitlines()):
...    if line.startswith('-'):
...       print "Second file is missing line: '%s'" % line
...    if line.startswith('+'):
...       print "Second file contains additional line: '%s'" % line
... 
Second file is missing line: '- 1 2 3 4'
Second file is missing line: '- test'
Second file is missing line: '- compare'
Second file contains additional line: '+ 10 11 12 13'