我的最终结果:我需要一个zip_longest()的变体,给定任意数量的序列,并排产生它们,只要它们不相同就填充空白。
处理文件时的并行是在键入时 vimdiff file1,file2,file3,....
例如,给定序列
a = ["foo", "bar", "baz", "asd"]
b = ["foo", "baz"]
c = ["foo", "bar"]
我需要一个产生这些元组的函数:
"foo", "foo", "foo"
"bar", None, "bar"
"baz", "baz", None
"asd", None, None
我设法使用difflib.SequenceMatcher完成它。但是,它仅适用于两个序列:
from difflib import SequenceMatcher
def zip_diff2(a, b, fillvalue=None):
matches = SequenceMatcher(None, a, b).get_matching_blocks()
for match, next_match in zip([None] + matches, matches + [None]):
if match is None:
# Process disjoined elements before the first match
for i in range(0, next_match.a):
yield a[i], fillvalue
for i in range(0, next_match.b):
yield fillvalue, b[i]
else:
for i in range(match.size):
yield a[match.a + i], b[match.b + i]
if next_match is None:
a_end = len(a)
b_end = len(b)
else:
a_end = next_match.a
b_end = next_match.b
for i in range(match.a + match.size, a_end):
yield a[i], fillvalue
for i in range(match.b + match.size, b_end):
yield fillvalue, b[i]
如何让它在任意数量的序列上工作?
答案 0 :(得分:0)
为了达到你想要的目的,我认为有必要首先用给定序列中的所有可能值创建基本序列。为此,我做了代码:
def build_base_sequence(*sequences):
# Getting the biggest sequence size.
max_count = 0
for sequence in sequences:
max_count = max(max_count, len(sequence))
# Normalizing the sequences to have all the same size.
new_sequences = []
for sequence in sequences:
new_sequence = sequence + [None] * max_count
new_sequences.append(new_sequence[:max_count])
# Building the base sequence:
base_sequence = []
for values in zip(*new_sequences):
for value in values:
if value is None or value in base_sequence:
continue
base_sequence.append(value)
return base_sequence
您可以使用您的功能,多次调用它。我认为difflib.SequenceMatcher
的使用太复杂了,所以我自己做了代码:
def zip_diff(*sequences):
base_sequence = build_base_sequence(*sequences)
# Building new sequences based on base_sequence
new_sequences = []
for sequence in sequences:
new_sequence = [None] * len(base_sequence)
for value in sequence:
new_sequence[base_sequence.index(value)] = value
new_sequences.append(new_sequence)
# Now let's yield the values
for values in zip(*new_sequences):
yield values
这是一种 newbie / dummy / naive 代码,但是,嘿,这样做了!
>>> a = ["foo", "bar", "baz", "asd"]
>>> b = ["foo", "baz"]
>>> c = ["foo", "bar"]
>>> for values in zip_diff(a, b, c):
... print values
...
('foo', 'foo', 'foo')
('bar', None, 'bar')
('baz', 'baz', None)
('asd', None, None)
>>>
我希望它能以某种方式帮助你。