difflib.SequenceMatcher关于两个以上的序列

时间:2015-07-09 16:57:07

标签: python

我的最终结果:我需要一个zip_longest()的变体,给定任意数量的序列,并排产生它们,只要它们不相同就填充空白。

处理文件时的并行是在键入时 vimdiff file1,file2,file3,....

例如,给定序列

a = ["foo", "bar", "baz", "asd"]
b = ["foo", "baz"]
c = ["foo", "bar"]

我需要一个产生这些元组的函数:

"foo", "foo", "foo"
"bar", None, "bar"
"baz", "baz", None
"asd", None, None

我设法使用difflib.SequenceMatcher完成它。但是,它仅适用于两个序列:

from difflib import SequenceMatcher

def zip_diff2(a, b, fillvalue=None):
    matches = SequenceMatcher(None, a, b).get_matching_blocks()
    for match, next_match in zip([None] + matches, matches + [None]):

        if match is None:
            # Process disjoined elements before the first match
            for i in range(0, next_match.a):
                yield a[i], fillvalue
            for i in range(0, next_match.b):
                yield fillvalue, b[i]
        else:
            for i in range(match.size):
                yield a[match.a + i], b[match.b + i]

            if next_match is None:
                a_end = len(a)
                b_end = len(b)
            else:
                a_end = next_match.a
                b_end = next_match.b

            for i in range(match.a + match.size, a_end):
                yield a[i], fillvalue
            for i in range(match.b + match.size, b_end):
                yield fillvalue, b[i]

如何让它在任意数量的序列上工作?

1 个答案:

答案 0 :(得分:0)

为了达到你想要的目的,我认为有必要首先用给定序列中的所有可能值创建基本序列。为此,我做了代码:

def build_base_sequence(*sequences):
    # Getting the biggest sequence size.
    max_count = 0
    for sequence in sequences:
        max_count = max(max_count, len(sequence))

    # Normalizing the sequences to have all the same size.
    new_sequences = []
    for sequence in sequences:
        new_sequence = sequence + [None] * max_count
        new_sequences.append(new_sequence[:max_count])

    # Building the base sequence:
    base_sequence = []
    for values in zip(*new_sequences):
        for value in values:
            if value is None or value in base_sequence:
                continue
            base_sequence.append(value)

    return base_sequence

您可以使用您的功能,多次调用它。我认为difflib.SequenceMatcher的使用太复杂了,所以我自己做了代码:

def zip_diff(*sequences):
    base_sequence = build_base_sequence(*sequences)

    # Building new sequences based on base_sequence
    new_sequences = []
    for sequence in sequences:
        new_sequence = [None] * len(base_sequence)
        for value in sequence:
            new_sequence[base_sequence.index(value)] = value
        new_sequences.append(new_sequence)

    # Now let's yield the values
    for values in zip(*new_sequences):
        yield values

这是一种 newbie / dummy / naive 代码,但是,嘿,这样做了!

>>> a = ["foo", "bar", "baz", "asd"]
>>> b = ["foo", "baz"]
>>> c = ["foo", "bar"]
>>> for values in zip_diff(a, b, c):
...     print values
...
('foo', 'foo', 'foo')
('bar', None, 'bar')
('baz', 'baz', None)
('asd', None, None)
>>> 

我希望它能以某种方式帮助你。