我使用以下代码并行迭代矩阵的上三角形部分,但我宁愿这样做而不实例化整个索引对。
目标是处理矩阵的上三角部分中的所有项目,但是并行化该处理。另请注意,如果他们有一些工具可以帮助我,我可以使用第三方库(numpy等)。
n_processes = 4
n = 1000 # num cols/rows in matrix
pairs = [(i, j) for i, j in itertools.combinations(xrange(n), 2)]
per_chunk = int(round(len(pairs) / float(n_processes)))
pair_chunks = [pairs[i*per_chunk:i*per_chunk+per_chunk] for i in xrange(n_processes)]
p = multiprocessing.Process(target=process_pairs, args=pair_chunks[0])
p = multiprocessing.Process(target=process_pairs, args=pair_chunks[1])
p = multiprocessing.Process(target=process_pairs, args=pair_chunks[2])
p = multiprocessing.Process(target=process_pairs, args=pair_chunks[3])
def process_pairs(cur_pairs):
for i, j in pairs:
# do some stuff
任何关于将其表示为生成器的聪明想法(即,不生成所有索引对)?因此,需要将对加载到内存中,如果 n 非常大,那么这就是我想要避免的内存命中。
答案 0 :(得分:1)
也许就是这样(转换为Python 3):
from itertools import combinations_with_replacement, islice, tee
n_processes = 3
n = 10 # num cols/rows in matrix
pairs = ((i, j) for i, j in combinations_with_replacement(range(n), 2) if i != j)
pair_chunks = [
islice(p, i, None, n_processes)
for i, p in enumerate(tee(pairs, n_processes))
]
print(pair_chunks)
print([list(x) for x in pair_chunks])
输出:
[<itertools.islice object at 0x7f2149fbe138>, <itertools.islice object at 0x7f2149fbecc8>, <itertools.islice object at 0x7f2149fbe228>]
[[(0, 1), (0, 4), (0, 7), (1, 2), (1, 5), (1, 8), (2, 4), (2, 7), (3, 4), (3, 7), (4, 5), (4, 8), (5, 7), (6, 7), (7, 8)], [(0, 2), (0, 5), (0, 8), (1, 3), (1, 6), (1, 9), (2, 5), (2, 8), (3, 5), (3, 8), (4, 6), (4, 9), (5, 8), (6, 8), (7, 9)], [(0, 3), (0, 6), (0, 9), (1, 4), (1, 7), (2, 3), (2, 6), (2, 9), (3, 6), (3, 9), (4, 7), (5, 6), (5, 9), (6, 9), (8, 9)]]
使用tee
复制生成器,然后使用islice
创建一个新生成器,该生成器从不同的位置开始,每个向前移动n_processes步骤。
或使用流程的完整示例:
from multiprocessing import Process
from itertools import combinations_with_replacement, islice, tee
n_processes = 3
n = 10 # num cols/rows in matrix
pairs = ((i, j) for i, j in combinations_with_replacement(range(n), 2) if i != j)
pair_chunks = [
islice(p, i, None, n_processes)
for i, p in enumerate(tee(pairs, n_processes))
]
def process_pairs(i, pair_chunk):
print('process %d received type %s' % (i, type(pair_chunk)))
for x in pair_chunk:
print('process %d processing %s' % (i, x))
processes = [
Process(target=process_pairs, args=[i, pair_chunk])
for i, pair_chunk in enumerate(pair_chunks)
]
for p in processes:
p.start()
for p in processes:
p.join()
输出:
process 0 received type <class 'itertools.islice'>
process 1 received type <class 'itertools.islice'>
process 1 processing (0, 2)
process 1 processing (0, 5)
process 1 processing (0, 8)
process 1 processing (1, 3)
process 1 processing (1, 6)
process 1 processing (1, 9)
process 1 processing (2, 5)
process 1 processing (2, 8)
process 1 processing (3, 5)
process 0 processing (0, 1)
...
答案 1 :(得分:0)
据推测,如果你不想要索引,生成器将需要直接返回值。
同样,听起来好像你不想展平上三角形,所以行向量需要保持不同。
根据这些推定要求,这里有一个生成连续行向量切片的生成器:
>>> def generate_upper_triangular(m):
for i, row in enumerate(m):
yield row[i:]
>>> m = [[1, 2, 3],
[0, 5, 6],
[0, 0, 9]]
>>> for vec in generate_upper_triangular(m):
print(vec)
[1, 2, 3]
[5, 6]
[9]