我有3个并行列表代表3元组(日期,描述,数量)和3个新列表,我需要合并而不创建重复的条目。是的,列表有重叠的条目,但是这些重复的条目没有组合在一起(而不是所有重复项都是0到x,所有新条目都是x到最后)。
我遇到的问题是迭代正确的次数以确保捕获所有重复项。相反,我的代码继续存在重复项。
for x in dates:
MoveNext = 'false'
while MoveNext == 'false':
Reiterate = 'false'
for a, b in enumerate(descriptions):
if Reiterate == 'true':
break
if b in edescriptions:
eindex = [c for c, d in enumerate(edescriptions) if d == b]
for e, f in enumerate(eindex):
if Reiterate == 'true':
break
if edates[f] == dates[a]:
if eamounts[f] == amounts[a]:
del dates[a]
del edates[f]
del descriptions[a]
del edescriptions[f]
del amounts[a]
del eamounts[f]
Reiterate = 'true'
break
else:
MoveNext = 'true'
else:
MoveNext = 'true'
else:
MoveNext = 'true'
我不知道这是不是巧合,但我目前正在删除新项目的一半,而另一半则保留。实际上,应该远远少于剩下的。这让我觉得for x in dates:
没有迭代正确的次数。
答案 0 :(得分:1)
我建议采用不同的方法:不要尝试从列表中删除项目(或者更糟糕的是,删除多个并行列表),而是通过输入和yield
仅传递通过测试的数据 - 在这种情况下,您以前见过的数据。使用单个输入流可以更轻松。
你的数据列表迫切需要制作成物品,因为每件作品(如日期)没有其他两件就没有意义......至少对你目前来说是这样。下面,我首先将每个三元组组合成一个Record
,一个collections.namedtuple
的实例。他们非常适合这种使用 - 一劳永逸的工作。
在下面的程序中,build_records
会从您的三个输入列表中创建Record
个对象。 dedup_records
使用Record
合并多个unique
个对象流,以过滤掉重复项。保持每个函数较小(大多数main
函数是测试数据)使每个步骤都易于测试。
#!/usr/bin/env python3
import collections
import itertools
Record = collections.namedtuple('Record', ['date', 'description', 'amount'])
def unique(records):
'''
Yields only the unique Records in the given iterable of Records.
'''
seen = set()
for record in records:
if record not in seen:
seen.add(record)
yield record
return
def dedup_records(*record_iterables):
'''
Yields unique Records from multiple iterables of Records, preserving the
order of first appearance.
'''
all_records = itertools.chain(*record_iterables)
yield from unique(all_records)
return
def build_records(dates, descriptions, amounts):
'''
Yields Record objects built from each date-description-amount triplet.
'''
for args in zip(dates, descriptions, amounts):
yield Record(*args)
return
def main():
# Sample data
dates_old = [
'2000-01-01',
'2001-01-01',
'2002-01-01',
'2003-01-01',
'2000-01-01',
'2001-01-01',
'2002-01-01',
'2003-01-01',
]
dates_new = [
'2000-01-01',
'2001-01-01',
'2002-01-01',
'2003-01-01',
'2003-01-01',
'2002-01-01',
'2001-01-01',
'2000-01-01',
]
descriptions_old = ['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd']
descriptions_new = ['b', 'b', 'c', 'a', 'a', 'c', 'd', 'd']
amounts_old = [0, 1, 0, 1, 0, 1, 0, 1]
amounts_new = [0, 0, 0, 0, 1, 1, 1, 1]
old = [dates_old, descriptions_old, amounts_old]
new = [dates_new, descriptions_new, amounts_new]
for record in dedup_records(build_records(*old), build_records(*new)):
print(record)
return
if '__main__' == __name__:
main()
这会将16输入Record
减少到11:
Record(date='2000-01-01', description='a', amount=0)
Record(date='2001-01-01', description='b', amount=1)
Record(date='2002-01-01', description='c', amount=0)
Record(date='2003-01-01', description='d', amount=1)
Record(date='2000-01-01', description='b', amount=0)
Record(date='2001-01-01', description='b', amount=0)
Record(date='2003-01-01', description='a', amount=0)
Record(date='2003-01-01', description='a', amount=1)
Record(date='2002-01-01', description='c', amount=1)
Record(date='2001-01-01', description='d', amount=1)
Record(date='2000-01-01', description='d', amount=1)
请注意,yield from ...
语法需要Python 3.3或更高版本。