试图使两个子进程分担处理同一资源的负担时出现问题

时间:2019-04-29 14:23:02

标签: python process multiprocessing python-multiprocessing

我正在弄混python multiprocessing模块。但是有些事情没有按我预期的那样工作,所以现在我有些困惑。

在python脚本中,我创建了两个子进程,因此它们可以使用相同的资源。我当时以为他们将或多或少地平均“分担”负载,但似乎不是这样做,而是其中一个进程仅执行一次,而另一个进程几乎执行所有操作。

为了测试它,我编写了以下代码:

#!/usr/bin/python

import os
import multiprocessing

# Worker function
def worker(queueA, queueB):
    while(queueA.qsize() != 0):
        item = queueA.get()
        item = "item: " + item + ". processed by worker " + str(os.getpid())
        queueB.put(item)
    return

# IPC Manager
manager = multiprocessing.Manager()
queueA = multiprocessing.Queue()
queueB = multiprocessing.Queue()

# Fill queueA with data
for i in range(0, 10):
    queueA.put("hello" + str(i+1))

# Create processes
process1 = multiprocessing.Process(target = worker, args = (queueA, queueB,))
process2 = multiprocessing.Process(target = worker, args = (queueA, queueB,))

# Call processes
process1.start()
process2.start()

# Wait for processes to stop processing
process1.join()
process2.join()

for i in range(0, queueB.qsize()):
    print queueB.get()

然后显示以下内容:

item: hello1. processed by worker 11483
item: hello3. processed by worker 11483
item: hello4. processed by worker 11483
item: hello5. processed by worker 11483
item: hello6. processed by worker 11483
item: hello7. processed by worker 11483
item: hello8. processed by worker 11483
item: hello9. processed by worker 11483
item: hello10. processed by worker 11483
item: hello2. processed by worker 11482

如您所见,其中一个进程仅使用其中一个元素,并且不会继续获取队列中的更多元素,而另一个必须处理其他所有元素。

我认为这是不正确的,或者至少不是我所期望的。您能告诉我哪种是实施此想法的正确方法吗?

1 个答案:

答案 0 :(得分:1)

您是正确的,它们不会完全相等,但这主要是因为您的测试样本很小。每个过程开始和开始处理都需要时间。处理队列中的项目所需的时间非常短,因此一个可以在另一个通过之前迅速处理9个项目。

我在下面进行了测试(在Python3中,但它也应适用于2.7,只需将bool函数更改为print()语句)即可:

print

我的输出(每个进程完成了多少计数):

import os
import multiprocessing

# Worker function
def worker(queueA, queueB):
    for item in iter(queueA.get, 'STOP'):
        out = str(os.getpid())
        queueB.put(out)
    return

# IPC Manager
manager = multiprocessing.Manager()
queueA = multiprocessing.Queue()
queueB = multiprocessing.Queue()

# Fill queueA with data
for i in range(0, 1000):
    queueA.put("hello" + str(i+1))

# Create processes
process1 = multiprocessing.Process(target = worker, args = (queueA, queueB,))
process2 = multiprocessing.Process(target = worker, args = (queueA, queueB,))

# Call processes
process1.start()
process2.start()

queueA.put('STOP')
queueA.put('STOP')

# Wait for processes to stop processing
process1.join()
process2.join()

all = {}
for i in range(1000):
    item = queueB.get()
    if item not in all:
        all[item] = 1
    else:
        all[item] += 1
print(all)

虽然它们并不完全相同,但是随着时间的推移,它们将变得近似相等。

修改:
确认这一点的另一种方法是在worker函数内添加一个{'18376': 537, '18377': 463}

time.sleep(3)

我像您的原始示例一样进行了def worker(queueA, queueB): for item in iter(queueA.get, 'STOP'): time.sleep(3) out = str(os.getpid()) queueB.put(out) return 测试,并得到了:

range(10)