Python:内存问题:基于条件的组合

时间:2018-10-30 21:19:06

标签: python memory-management out-of-memory

我有两个列表,希望生成符合以下条件的组合:

a = [1,7,12,50,51,52,57,59,60,61,67,96,59,58] 
b = [1,2,3,...200]

combination_a = [(p,q,r) for p in a for q in a for r in a]

combination_b = [(p,q,r) for p in b for q in b for r in b
                          if (p,q,r) not in combination_a]
print (combination_b)

在处理大量数据时,应该如何解决程序中的内存问题?如何在excel中获得输出?

import xlsxwriter workbook = xlsxwriter.Workbook('Sample.xlsx', {'constant_memory': True}) 
worksheet = workbook.add_worksheet() 

row = 0 

for row, group in enumerate (combo): 
  for col in range(3): 
    worksheet.write (row, col, group[col]) 

workbook.close()

2 个答案:

答案 0 :(得分:1)

一些优化。首先是将较小的集合存储在内存中,例如:

combination_a = set(itertools.product(a, repeat=3))
for triple in itertools.product(b, repeat=3):
  if triple not in combination_a:
    print(triple)

否则,如果您希望它们都很大,则可以避免通过以下方式实现组合:

set_a = set(a)
for triple in itertools.product(b, repeat=3):
  if not all((t in set_a) for t in triple):
    print(triple)

然后,如果combination_a不能容纳在内存中,那么您将遇到很多麻烦。遍历很多项目将意味着总运行时间将是天文数字

我建议保存为CSV而不是MS Excel格式。我认为将有超过一百万种组合超出Excel的处理能力?

循环编写只是一个问题:

import csv
with open('combinations_b.csv', 'w') as fd:
  out = csv.writer(fd)
  out.writerow(['a', 'b', 'c'])
  for triple in itertools.product(b, repeat=3):
    if triple not in combination_a:
      out.writerow(triple)

答案 1 :(得分:0)

这将单独打印三元组,但是生成器方面可以使其快速运行。

for triple in itertools.product(b, repeat=3):
    if triple not in itertools.product(a, repeat=3):
        print(triple)

OP更新后 ...

您可以简单地遍历生成器:

for group in (triple for triple in itertools.product(b, repeat=3)
                      if triple not in itertools.product(a, repeat=3)):
    worksheet.writerow(group)