我想对我的大型数据集执行批处理。我想并行运行n
线程以处理每批n
数据样本。这是一个具有简单功能的模拟:
from threading import Thread
import queue
def funct(i, a, b, out_que):
res = {}
res[i] = a+b
out_que.put(res)
# I try to run 3 threads in parallel, 5 times
res_list = []
for i in range(5):
threads = []
res_queue = queue.Queue()
for j in range(3):
thread = Thread(target=funct, args=(j, 5, 6, res_queue))
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
resulted = res_queue.get()
res_list.append(resulted)
print(res_list)
我想得到以下结果:
[{0: 11}, {1: 11}, {2: 11}, {3: 11}, {4: 11}, {5: 11}, {6: 11}, {7: 11}, {8: 11}, {9: 11}, {10: 11}, {11: 11}, {12: 11}, {13: 11}, {14: 11}]
但是,我明白了:
[{0: 11}, {0: 11}, {0: 11}, {0: 11}, {0: 11}]
答案 0 :(得分:1)
您从队列res_queue.get()
中仅获得一个元素。更改此部分
resulted = res_queue.get()
res_list.append(resulted)
像这样
resulted = []
while not res_queue.empty():
res_list.append(res_queue.get())
要获得这样的结果[{0: 11}, {1: 11}, {2: 11}, {3: 11}, {4: 11}, {5: 11}, {6: 11}, {7: 11}, {8: 11}, {9: 11}, {10: 11}, {11: 11}, {12: 11}, {13: 11}, {14: 11}]
,您需要传递5*i + j
中的j
而不是args=(j, 5, 6, res_queue)
。