循环删除匹配的并行列表,直到不再存在匹配为止

时间:2016-08-20 02:31:31

标签: python list python-3.x duplicates

我有3个并行列表代表3元组(日期,描述,数量)和3个新列表,我需要合并而不创建重复的条目。是的,列表有重叠的条目,但是这些重复的条目没有组合在一起(而不是所有重复项都是0到x,所有新条目都是x到最后)。

我遇到的问题是迭代正确的次数以确保捕获所有重复项。相反,我的代码继续存在重复项。

for x in dates:
    MoveNext = 'false'
    while MoveNext == 'false':
        Reiterate = 'false'
        for a, b in enumerate(descriptions):
            if Reiterate == 'true':
                break
            if b in edescriptions:
                eindex = [c for c, d in enumerate(edescriptions) if d == b]
                for e, f in enumerate(eindex):
                    if Reiterate == 'true':
                        break
                    if edates[f] == dates[a]:
                        if eamounts[f] == amounts[a]:
                            del dates[a]
                            del edates[f]
                            del descriptions[a]
                            del edescriptions[f]
                            del amounts[a]
                            del eamounts[f]
                            Reiterate = 'true'
                            break
                        else:
                            MoveNext = 'true'
                    else:
                        MoveNext = 'true'
            else:
                MoveNext = 'true'

我不知道这是不是巧合,但我目前正在删除新项目的一半,而另一半则保留。实际上,应该远远少于剩下的。这让我觉得for x in dates:没有迭代正确的次数。

1 个答案:

答案 0 :(得分:1)

我建议采用不同的方法:不要尝试从列表中删除项目(或者更糟糕的是,删除多个并行列表),而是通过输入和yield仅传递通过测试的数据 - 在这种情况下,您以前见过的数据。使用单个输入流可以更轻松。

你的数据列表迫切需要制作成物品,因为每件作品(如日期)没有其他两件就没有意义......至少对你目前来说是这样。下面,我首先将每个三元组组合成一个Record,一个collections.namedtuple的实例。他们非常适合这种使用 - 一劳永逸的工作。

在下面的程序中,build_records会从您的三个输入列表中创建Record个对象。 dedup_records使用Record合并多个unique个对象流,以过滤掉重复项。保持每个函数较小(大多数main函数是测试数据)使每个步骤都易于测试。

#!/usr/bin/env python3

import collections
import itertools


Record = collections.namedtuple('Record', ['date', 'description', 'amount'])


def unique(records):
    '''
    Yields only the unique Records in the given iterable of Records.
    '''
    seen = set()
    for record in records:
        if record not in seen:
            seen.add(record)
            yield record
    return


def dedup_records(*record_iterables):
    '''
    Yields unique Records from multiple iterables of Records, preserving the
    order of first appearance.
    '''
    all_records = itertools.chain(*record_iterables)
    yield from unique(all_records)
    return


def build_records(dates, descriptions, amounts):
    '''
    Yields Record objects built from each date-description-amount triplet.
    '''
    for args in zip(dates, descriptions, amounts):
        yield Record(*args)
    return


def main():
    # Sample data
    dates_old = [
      '2000-01-01',
      '2001-01-01',
      '2002-01-01',
      '2003-01-01',
      '2000-01-01',
      '2001-01-01',
      '2002-01-01',
      '2003-01-01',
      ]
    dates_new = [
      '2000-01-01',
      '2001-01-01',
      '2002-01-01',
      '2003-01-01',
      '2003-01-01',
      '2002-01-01',
      '2001-01-01',
      '2000-01-01',
      ]
    descriptions_old = ['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd']
    descriptions_new = ['b', 'b', 'c', 'a', 'a', 'c', 'd', 'd']
    amounts_old = [0, 1, 0, 1, 0, 1, 0, 1]
    amounts_new = [0, 0, 0, 0, 1, 1, 1, 1]
    old = [dates_old, descriptions_old, amounts_old]
    new = [dates_new, descriptions_new, amounts_new]

    for record in dedup_records(build_records(*old), build_records(*new)):
        print(record)
    return


if '__main__' == __name__:
    main()

这会将16输入Record减少到11:

Record(date='2000-01-01', description='a', amount=0)
Record(date='2001-01-01', description='b', amount=1)
Record(date='2002-01-01', description='c', amount=0)
Record(date='2003-01-01', description='d', amount=1)
Record(date='2000-01-01', description='b', amount=0)
Record(date='2001-01-01', description='b', amount=0)
Record(date='2003-01-01', description='a', amount=0)
Record(date='2003-01-01', description='a', amount=1)
Record(date='2002-01-01', description='c', amount=1)
Record(date='2001-01-01', description='d', amount=1)
Record(date='2000-01-01', description='d', amount=1)

请注意,yield from ...语法需要Python 3.3或更高版本。