我有两个看起来像这样的文件,它们之间有一些区别:
第一档:
{16:[3, [-7, 87, 20, 32]]}
{17:[2, [-3, 88, 16, 28], 3, [-6, 84, 20, 32]]}
{18:[2, [-1, 88, 16, 28], 3, [-3, 84, 20, 32]]}
{19:[2, [1, 89, 16, 28], 3, [-2, 85, 20, 32]]}
{20:[2, [9, 94, 16, 28], 3, [1, 85, 20, 32]]}
{21:[2, [12, 96, 16, 28], 3, [2, 76, 19, 31]]}
{22:[2, [15, 97, 16, 28], 3, [4, 73, 19, 29]]}
{23:[2, [18, 96, 16, 28], 3, [6, 71, 19, 29], 10, [-10, 60, 51, 82]]}
{24:[2, [22, 97, 16, 28], 3, [9, 71, 19, 27], 10, [-5, 63, 49, 78]]}
{25:[2, [25, 99, 16, 28], 3, [13, 71, 17, 26], 10, [-1, 64, 46, 77]]}
{26:[2, [29, 101, 16, 28], 3, [17, 70, 16, 25], 10, [-1, 65, 45, 77]]}
第二档:
{16:[3, [-7, 86, 20, 32]]}
{17:[2, [-3, 82, 16, 28], 3, [-6, 84, 20, 32]]}
{18:[2, [-1, 88, 16, 27], 3, [-3, 84, 20, 32]]}
{19:[2, [1, 89, 16, 28], 3, [-2, 84, 20, 32]]}
{20:[2, [9, 94, 15, 28], 3, [1, 85, 20, 32]]}
{21:[2, [12, 96, 16, 28], 3, [1, 76, 19, 31]]}
{22:[2, [15, 97, 17, 28], 3, [4, 73, 19, 29]]}
{23:[2, [18, 96, 18, 28], 3, [6, 71, 19, 29], 10, [-10, 60, 51, 82]]}
{24:[2, [22, 97, 16, 28], 3, [9, 71, 20, 27], 10, [-5, 63, 49, 78]]}
{25:[2, [25, 99, 16, 28], 3, [13, 71, 17, 26], 10, [-1, 64, 46, 77]]}
{26:[2, [29, 101, 17, 28], 3, [17, 70, 16, 25], 10, [-1, 65, 45, 77]]}
我使用difflib比较它们并打印出它们之间存在差异的线条。
我要做的是打印出共享相同frame
的最小和最大id
值。
框架是每一行的关键,因此这种情况下的框架范围从16
到26
。 id是每个4个值列表之前的值。所以第一行的ID是3
。第二行有两个ID,分别为2
和3
。
所以我想写出的一个例子是:
17 - 36
鉴于共享标识frames
的{{1}}之一与我正在比较的文件不同。
对于这样的每个差异,我需要写出一个只包含起始帧和结束帧的新文件,然后我将处理将其他字符串连接到每个文件。
这是当前difflib用法,它打印出具有不同的每一行:
3
如何通过调整此执行块来实现上述描述?
请注意,两个文件共享相同的def compare(f1, f2):
with open(f1+'.txt', 'r') as fin1, open(f2+'.txt', 'r') as fin2:
diff = difflib.ndiff(fin1.readlines(), fin2.readlines())
outcome = ''.join(x[2:] for x in diff if x.startswith('- '))
print outcome
ammount但不是相同的frame
,因此我需要为每个差异编写两个不同的文件,可能写入文件夹。因此,如果这两个文件有20个不同,我需要为每个原始文件分别有两个主文件夹,每个文件夹包含相同ID的每个开始和结束id
的文本文件。
答案 0 :(得分:1)
假设您的差异列表是您在帖子开头提供的文件内容。我进行了2次,第1次获取每个ID的帧列表:
>>> from collections import defaultdict
>>> diffs = defaultdict(list)
>>> for line in s.split('\n'):
d = eval(line) # We have a dict
for k in d: # Only one value, k is the frame
# Only get even values for ids
for i in range(0, len(d[k]), 2):
diffs[d[k][i]].append(k)
>>> diffs # We now have a dict with ids as keys :
defaultdict(<type 'list'>, {10: [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36], 2: [17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33], 3: [16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36], 29: [31, 32, 33, 34, 35, 36]})
现在我们得到每个id的范围,感谢this other SO post,它有助于从索引列表中获取范围:
>>> from operator import itemgetter
>>> from itertools import groupby
>>> for id_ in diffs:
diffs[id_].sort()
for k, g in groupby(enumerate(diffs[id_]), lambda (i, x): i - x):
group = map(itemgetter(1), g)
print 'id {0} : {1} -> {2}'.format(id_, group[0], group[-1])
id 10 : 23 -> 36
id 2 : 17 -> 33
id 3 : 16 -> 36
id 29 : 31 -> 36
然后,对于每个id,您可以获得差异范围。我想通过一点调整你可以得到你想要的东西。
编辑:以下是同一块的最终答案:
>>> def compare(f1, f2):
# 2 embedded 'with' because I'm on Python 2.5 :-)
with open(f1+'.txt', 'r') as fin1:
with open(f2+'.txt', 'r') as fin2:
lines1 = fin1.readlines()
lines2 = fin2.readlines()
# Do not forget the strip function to remove unnecessary '\n'
diff_lines = [l.strip() for l in lines1 if l not in lines2]
# Ok, we have our differences (very basic)
diffs = defaultdict(list)
for line in diff_lines:
d = eval(line) # We have a dict
for k in d:
list_ids = d[k] # Only one value, k is the frame
for i in range(0, len(d[k]), 2):
diffs[d[k][i]].append(k)
for id_ in diffs:
diffs[id_].sort()
for k, g in groupby(enumerate(diffs[id_]), lambda (i, x): i - x):
group = map(itemgetter(1), g)
print 'id {0} : {1} -> {2}'.format(id_, group[0], group[-1])
>>> compare(r'E:\CFM\Dev\Python\test\f1', r'E:\CFM\Dev\Python\test\f2')
id 2 : 17 -> 24
id 2 : 26 -> 26
id 3 : 16 -> 24
id 3 : 26 -> 26
id 10 : 23 -> 24
id 10 : 26 -> 26