我有两个列表,希望生成符合以下条件的组合:
a = [1,7,12,50,51,52,57,59,60,61,67,96,59,58]
b = [1,2,3,...200]
combination_a = [(p,q,r) for p in a for q in a for r in a]
combination_b = [(p,q,r) for p in b for q in b for r in b
if (p,q,r) not in combination_a]
print (combination_b)
在处理大量数据时,应该如何解决程序中的内存问题?如何在excel中获得输出?
import xlsxwriter workbook = xlsxwriter.Workbook('Sample.xlsx', {'constant_memory': True})
worksheet = workbook.add_worksheet()
row = 0
for row, group in enumerate (combo):
for col in range(3):
worksheet.write (row, col, group[col])
workbook.close()
答案 0 :(得分:1)
一些优化。首先是将较小的集合存储在内存中,例如:
combination_a = set(itertools.product(a, repeat=3))
for triple in itertools.product(b, repeat=3):
if triple not in combination_a:
print(triple)
否则,如果您希望它们都很大,则可以避免通过以下方式实现组合:
set_a = set(a)
for triple in itertools.product(b, repeat=3):
if not all((t in set_a) for t in triple):
print(triple)
然后,如果combination_a
不能容纳在内存中,那么您将遇到很多麻烦。遍历很多项目将意味着总运行时间将是天文数字
我建议保存为CSV而不是MS Excel格式。我认为将有超过一百万种组合超出Excel的处理能力?
循环编写只是一个问题:
import csv
with open('combinations_b.csv', 'w') as fd:
out = csv.writer(fd)
out.writerow(['a', 'b', 'c'])
for triple in itertools.product(b, repeat=3):
if triple not in combination_a:
out.writerow(triple)
答案 1 :(得分:0)
这将单独打印三元组,但是生成器方面可以使其快速运行。
for triple in itertools.product(b, repeat=3):
if triple not in itertools.product(a, repeat=3):
print(triple)
OP更新后 ...
您可以简单地遍历生成器:
for group in (triple for triple in itertools.product(b, repeat=3)
if triple not in itertools.product(a, repeat=3)):
worksheet.writerow(group)