Python生成器在上三角矩阵上均匀分割(用于并行化)迭代

时间:2017-10-28 01:18:09

标签: python optimization generator

我使用以下代码并行迭代矩阵的上三角形部分,但我宁愿这样做而不实例化整个索引对。

目标是处理矩阵的上三角部分中的所有项目,但是并行化该处理。另请注意,如果他们有一些工具可以帮助我,我可以使用第三方库(numpy等)。

n_processes = 4
n = 1000  # num cols/rows in matrix
pairs = [(i, j) for i, j in itertools.combinations(xrange(n), 2)]
per_chunk = int(round(len(pairs) / float(n_processes)))
pair_chunks = [pairs[i*per_chunk:i*per_chunk+per_chunk] for i in xrange(n_processes)]
p = multiprocessing.Process(target=process_pairs, args=pair_chunks[0])
p = multiprocessing.Process(target=process_pairs, args=pair_chunks[1])
p = multiprocessing.Process(target=process_pairs, args=pair_chunks[2])
p = multiprocessing.Process(target=process_pairs, args=pair_chunks[3])
def process_pairs(cur_pairs):
    for i, j in pairs:
        # do some stuff

任何关于将其表示为生成器的聪明想法(即,不生成所有索引对)?因此,需要将对加载到内存中,如果 n 非常大,那么这就是我想要避免的内存命中。

2 个答案:

答案 0 :(得分:1)

也许就是这样(转换为Python 3):

from itertools import combinations_with_replacement, islice, tee

n_processes = 3
n = 10  # num cols/rows in matrix

pairs = ((i, j) for i, j in combinations_with_replacement(range(n), 2) if i != j)
pair_chunks = [
  islice(p, i, None, n_processes)
  for i, p in enumerate(tee(pairs, n_processes))
]

print(pair_chunks)
print([list(x) for x in pair_chunks])

输出:

[<itertools.islice object at 0x7f2149fbe138>, <itertools.islice object at 0x7f2149fbecc8>, <itertools.islice object at 0x7f2149fbe228>]
[[(0, 1), (0, 4), (0, 7), (1, 2), (1, 5), (1, 8), (2, 4), (2, 7), (3, 4), (3, 7), (4, 5), (4, 8), (5, 7), (6, 7), (7, 8)], [(0, 2), (0, 5), (0, 8), (1, 3), (1, 6), (1, 9), (2, 5), (2, 8), (3, 5), (3, 8), (4, 6), (4, 9), (5, 8), (6, 8), (7, 9)], [(0, 3), (0, 6), (0, 9), (1, 4), (1, 7), (2, 3), (2, 6), (2, 9), (3, 6), (3, 9), (4, 7), (5, 6), (5, 9), (6, 9), (8, 9)]]

使用tee复制生成器,然后使用islice创建一个新生成器,该生成器从不同的位置开始,每个向前移动n_processes步骤。

或使用流程的完整示例:

from multiprocessing import Process
from itertools import combinations_with_replacement, islice, tee

n_processes = 3
n = 10  # num cols/rows in matrix

pairs = ((i, j) for i, j in combinations_with_replacement(range(n), 2) if i != j)
pair_chunks = [
    islice(p, i, None, n_processes)
    for i, p in enumerate(tee(pairs, n_processes))
]

def process_pairs(i, pair_chunk):
    print('process %d received type %s' % (i, type(pair_chunk)))
    for x in pair_chunk:
        print('process %d processing %s' % (i, x))

processes = [
    Process(target=process_pairs, args=[i, pair_chunk])
    for i, pair_chunk in enumerate(pair_chunks)
]
for p in processes:
    p.start()

for p in processes:
    p.join()

输出:

process 0 received type <class 'itertools.islice'>
process 1 received type <class 'itertools.islice'>
process 1 processing (0, 2)
process 1 processing (0, 5)
process 1 processing (0, 8)
process 1 processing (1, 3)
process 1 processing (1, 6)
process 1 processing (1, 9)
process 1 processing (2, 5)
process 1 processing (2, 8)
process 1 processing (3, 5)
process 0 processing (0, 1)
...

答案 1 :(得分:0)

据推测,如果你不想要索引,生成器将需要直接返回值。

同样,听起来好像你不想展平上三角形,所以行向量需要保持不同。

根据这些推定要求,这里有一个生成连续行向量切片的生成器:

>>> def generate_upper_triangular(m):
        for i, row in enumerate(m):
            yield row[i:]

>>> m = [[1, 2, 3],
         [0, 5, 6],
         [0, 0, 9]]
>>> for vec in generate_upper_triangular(m):
        print(vec)

[1, 2, 3]
[5, 6]
[9]