Question

我的最终结果：我需要一个zip_longest（）的变体，给定任意数量的序列，并排产生它们，只要它们不相同就填充空白。

处理文件时的并行是在键入时 vimdiff file1，file2，file3，....

例如，给定序列

a = ["foo", "bar", "baz", "asd"]
b = ["foo", "baz"]
c = ["foo", "bar"]

我需要一个产生这些元组的函数：

"foo", "foo", "foo"
"bar", None, "bar"
"baz", "baz", None
"asd", None, None

我设法使用difflib.SequenceMatcher完成它。但是，它仅适用于两个序列：

from difflib import SequenceMatcher

def zip_diff2(a, b, fillvalue=None):
    matches = SequenceMatcher(None, a, b).get_matching_blocks()
    for match, next_match in zip([None] + matches, matches + [None]):

        if match is None:
            # Process disjoined elements before the first match
            for i in range(0, next_match.a):
                yield a[i], fillvalue
            for i in range(0, next_match.b):
                yield fillvalue, b[i]
        else:
            for i in range(match.size):
                yield a[match.a + i], b[match.b + i]

            if next_match is None:
                a_end = len(a)
                b_end = len(b)
            else:
                a_end = next_match.a
                b_end = next_match.b

            for i in range(match.a + match.size, a_end):
                yield a[i], fillvalue
            for i in range(match.b + match.size, b_end):
                yield fillvalue, b[i]

如何让它在任意数量的序列上工作？

Answer 1

为了达到你想要的目的，我认为有必要首先用给定序列中的所有可能值创建基本序列。为此，我做了代码：

def build_base_sequence(*sequences):
    # Getting the biggest sequence size.
    max_count = 0
    for sequence in sequences:
        max_count = max(max_count, len(sequence))

    # Normalizing the sequences to have all the same size.
    new_sequences = []
    for sequence in sequences:
        new_sequence = sequence + [None] * max_count
        new_sequences.append(new_sequence[:max_count])

    # Building the base sequence:
    base_sequence = []
    for values in zip(*new_sequences):
        for value in values:
            if value is None or value in base_sequence:
                continue
            base_sequence.append(value)

    return base_sequence

您可以使用您的功能，多次调用它。我认为difflib.SequenceMatcher的使用太复杂了，所以我自己做了代码：

def zip_diff(*sequences):
    base_sequence = build_base_sequence(*sequences)

    # Building new sequences based on base_sequence
    new_sequences = []
    for sequence in sequences:
        new_sequence = [None] * len(base_sequence)
        for value in sequence:
            new_sequence[base_sequence.index(value)] = value
        new_sequences.append(new_sequence)

    # Now let's yield the values
    for values in zip(*new_sequences):
        yield values

这是一种 newbie / dummy / naive 代码，但是，嘿，这样做了！

>>> a = ["foo", "bar", "baz", "asd"]
>>> b = ["foo", "baz"]
>>> c = ["foo", "bar"]
>>> for values in zip_diff(a, b, c):
...     print values
...
('foo', 'foo', 'foo')
('bar', None, 'bar')
('baz', 'baz', None)
('asd', None, None)
>>>

我希望它能以某种方式帮助你。

difflib.SequenceMatcher关于两个以上的序列

1 个答案: