通过为缺少的条目添加特殊值来对齐两个列表

时间:2013-08-18 21:20:49

标签: python

我有两个相同可排序类型的值列表,它们按升序排序,但(i)它们的长度不同,(ii)一个列表中的条目可能会从另一个列表中丢失反之亦然。但是我知道一个列表中的大多数值都存在于另一个列表中,并且任何列表中都没有重复项。

所以我们可能会遇到这种情况:

list1 = [value1-0, value1-1, value1-2, value1-3]
list2 = [value2-0, value2-1, value2-2]

如果发生两个列表的值的顺序为:

value1-0 < (value1-1 = value2-0) < value2-1 < value1-2 < value1-3 < value2-2

我们可以将组合的排序值名称提供给两个列表中的值,例如:

valueA < valueB < valueC < valueD < valueE < valueF

这样两个列表可以写成:

list1 = [valueA, valueB, valueD, valueE]
list2 = [valueB, valueC, valueF]

鉴于此,我希望列表成为:

new_list1 = [valueA,    valueB, "MISSING", valueD,    valueE,    "MISSING"]
new_list2 = ["MISSING", valueB, valueC,    "MISSING", "MISSING", valueF   ]

有人可以帮忙吗?

编辑:原始问题特别提到datetime个对象(因此特定于datetime s的评论),但已被推广为任何可排序类型。

3 个答案:

答案 0 :(得分:5)

这个问题激起了我的兴趣,所以我写了一个过于通用的解决方案。

这是一个

的功能
  • 对齐任意数量的序列
  • 适用于迭代器,因此它可以有效地处理长(或无限)序列
  • 支持重复值
  • 与Python 2和3兼容(如果我不关心历史Python版本,我会使用align_iterables(*inputs, missing_value=None)

import itertools

def align_iterables(inputs, missing=None):
    """Align sorted iterables

    Yields tuples with values from the respective `inputs`, placing
    `missing` if the value does not exist in the corresponding
    iterable.

    Example: align_generator('bc', 'bf', '', 'abf') yields:
        (None, None, None, 'a')
        ('b', 'b', None, 'b')
        ('c', None, None, None)
        (None, 'f', None, 'f')
    """
    End = object()
    iterators = [itertools.chain(i, [End]) for i in inputs]
    values = [next(i) for i in iterators]
    while not all(v is End for v in values):
        smallest = min(v for v in values if v is not End)
        yield tuple(v if v == smallest else missing for v in values)
        values = [next(i) if v == smallest else v
                  for i, v in zip(iterators, values)]

#这个问题的适配器问题:

def align_two_lists(list1, list2, missing="MISSING"):
    value = list(zip(*list(align_iterables([list1, list2], missing=missing))))
    if not value:
        return [[], []]
    else:
        a, b = value
        return [list(a), list(b)]

#问题问题的一组测试:

if __name__ == '__main__':
    assert align_two_lists('abcef', 'abcdef', '_') == [['a', 'b', 'c', '_', 'e', 'f'], ['a', 'b', 'c', 'd', 'e', 'f']]
    assert align_two_lists('a', 'abcdef', '_') == [['a', '_', '_', '_', '_', '_'], ['a', 'b', 'c', 'd', 'e', 'f']]
    assert align_two_lists('abcdef', 'a', '_') == [['a', 'b', 'c', 'd', 'e', 'f'], ['a', '_', '_', '_', '_', '_']]
    assert align_two_lists('', 'abcdef', '_') == [['_', '_', '_', '_', '_', '_'], ['a', 'b', 'c', 'd', 'e', 'f']]
    assert align_two_lists('abcdef', '', '_') == [['a', 'b', 'c', 'd', 'e', 'f'], ['_', '_', '_', '_', '_', '_']]
    assert align_two_lists('ace', 'abcdef', '_') == [['a', '_', 'c', '_', 'e', '_'], ['a', 'b', 'c', 'd', 'e', 'f']]
    assert align_two_lists('bdf', 'ace', '_') == [['_', 'b', '_', 'd', '_', 'f'], ['a', '_', 'c', '_', 'e', '_']]
    assert align_two_lists('ace', 'bdf', '_') == [['a', '_', 'c', '_', 'e', '_'], ['_', 'b', '_', 'd', '_', 'f']]
    assert align_two_lists('aaacd', 'acd', '_') == [['a', 'a', 'a', 'c', 'd'], ['a', '_', '_', 'c', 'd']]
    assert align_two_lists('acd', 'aaacd', '_') == [['a', '_', '_', 'c', 'd'], ['a', 'a', 'a', 'c', 'd']]
    assert align_two_lists('', '', '_') == [[], []]

    list1 = ["datetimeA", "datetimeB", "datetimeD", "datetimeE"]
    list2 = ["datetimeB", "datetimeC", "datetimeD", "datetimeF"]

    new_list1 = ["datetimeA", "datetimeB", "MISSING", "datetimeD", "datetimeE", "MISSING"]
    new_list2 = ["MISSING", "datetimeB", "datetimeC", "datetimeD", "MISSING", "datetimeF"]

    assert align_two_lists(list1, list2) == [new_list1, new_list2]

#还有一些额外的测试:

    # Also test multiple generators
    for expected, got in zip(
            [(None, None, None, 'a'),
             ('b', 'b', None, 'b'),
             ('c', None, None, None),
             (None, 'f', None, 'f')],
            align_iterables(['bc', 'bf', '', 'abf'])):
        assert expected == got

    assert list(align_iterables([])) == []

    # And an infinite generator
    for expected, got in zip(
            [(0, 0),
             ('X', 1),
             (2, 2),
             ('X', 3),
             (4, 4)],
            align_iterables([itertools.count(step=2), itertools.count()], missing='X')):
        assert expected == got

答案 1 :(得分:4)

这样的事情怎么样:

set1 = set(list1)
set2 = set(list2)
total = sorted(set1|set2)

new_list1 = [x if x in set1 else "MISSING" for x in total]
new_list2 = [x if x in set2 else "MISSING" for x in total]

答案 2 :(得分:1)

你可以尝试:

new_list1=[]
new_list2=[]

i=j=0
while True:
    print '1 ' + str(new_list1) +' '+str(i)
    print '2 ' + str(new_list2) +' '+str(j)
    if list1[i]==list2[j]:
        new_list1 += [list1[i]]
        new_list2 += [list2[j]]
        i=i+1
        j=j+1
    elif list1[i]>list2[j]:
        new_list1 += ["MISSING"]
        new_list2 += [list2[j]]
        j=j+1
    else: # list1[i]<list2[j]
        new_list1 += [list1[i]]
        new_list2 += ["MISSING"]
        i=i+1
    if i>=len(list1) or j>=len(list2):
        break
while i<len(list1):
    new_list1 += [list1[i]]
    new_list2 += ["MISSING"]
    i=i+1
while j<len(list2):
    new_list1 += ["MISSING"]
    new_list2 += [list2[j]]
    j=j+1

它看起来像很多代码,但它应该可以工作并在列表中循环一次。