如何在Python多处理中动态创建每进程队列

时间:2011-08-16 02:04:58

标签: python multiprocessing

我想动态创建多个Process es,其中每个实例都有一个队列,用于传入来自其他实例的消息,每个实例也可以创建新实例。因此,我们最终得到了一个互相发送的流程网络。允许每个实例发送给所有其他实例。

下面的代码将执行我想要的操作:它使用Manager.dict()来存储队列,确保传播更新,并使用Lock()来保护写访问权限到队列。但是,在添加新队列时,它会抛出"RuntimeError: Queue objects should only be shared between processes through inheritance"

问题在于,在启动时,我们不知道最终需要多少队列,因此我们必须动态创建它们。但由于除了施工时我们不能共享队列,我不知道该怎么做。

我知道有一种可能性就是让queues成为一个全局变量,而不是传递给__init__的托管变量:然后问题是,据我所知,问题是{ {1}}变量不会传播到其他进程。

编辑我正在研究进化算法。 EA是一种机器学习技术。 EA模拟“人口”,其通过适者生存,交叉和突变而发展。在 parallel EA中,如此处,我们还在群体之间进行迁移,对应于进程间通信。群岛也可以产生新岛屿,因此我们需要一种在动态创建的流程之间发送消息的方法。

queues

3 个答案:

答案 0 :(得分:3)

我不完全确定你的用例实际上是什么。也许如果你详细说明为什么你想让每个进程动态地生成一个具有连接队列的子进程,那么在这种情况下正确的解决方案将会更加明确。

无论如何,问题是,现在似乎没有一种很好的方法可以使用Multiprocessing动态创建管道或队列。

我认为,如果您愿意在每个流程中生成线程,您可以使用multiprocessing.connection.Listener/Client来回通信。我没有使用网络套接字,而是选择在线程之间进行通信,而不是产生线程。

动态过程产卵和网络套接字仍可能呈片状取决于如何multiprocessing产卵/派生一个新过程时清理你的文件描述符和您的解决方案将最有可能的工作更容易地在* nix衍生物。如果你担心套接字开销,你可以使用unix域套接字来增加一些轻量级代价,但代价是增加了在多个工作机器上运行节点的复杂性。

无论如何,这是一个使用网络套接字和全局进程列表来完成此任务的示例,因为我无法找到让multiprocessing做到的好方法。

import collections
import multiprocessing
import random
import select
import socket
import time


class MessagePassingProcess(multiprocessing.Process):
    def __init__(self, id_, processes):
        self.id = id_
        self.processes = processes
        self.queue = collections.deque()
        super(MessagePassingProcess, self).__init__()

    def run(self):
        print "Running"
        inputs = []
        outputs = []
        server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        address = self.processes[self.id]["address"]
        print "Process %s binding to %s"%(self.id, address)
        server.bind(address)
        server.listen(5)
        inputs.append(server)
        process = self.processes[self.id]
        process["listening"] = True
        self.processes[self.id] = process
        print "Process %s now listening!(%s)"%(self.id, process)
        while inputs:
            readable, writable, exceptional = select.select(inputs,
                                                           outputs,
                                                           inputs,
                                                           0.1)
            for sock in readable:
                print "Process %s has a readable scoket: %s"%(self.id,
                                                              sock)
                if sock is server:
                    print "Process %s has a readable server scoket: %s"%(self.id,
                                                              sock)
                    conn, addr = sock.accept()
                    conn.setblocking(0)
                    inputs.append(conn)
                else:
                    data = sock.recv(1024)
                    if data:
                        self.queue.append(data)
                        print "non server readable socket with data"
                    else:
                        inputs.remove(sock)
                        sock.close()
                        print "non server readable socket with no data"

            for sock in exceptional:
                print "exception occured on socket %s"%(sock)
                inputs.remove(sock)
                sock.close()

            while len(self.queue) >= 1:
                print "Received:", self.queue.pop()

            # send a message to a random process:
            random_id = random.choice(list(self.processes.keys()))
            print "%s Attempting to send message to %s"%(self.id, random_id)
            random_process = self.processes[random_id]
            print "random_process:", random_process
            if random_process["listening"]:
                random_address = random_process["address"]
                s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
                try:
                    s.connect(random_address)
                except socket.error:
                    print "%s failed to send to %s"%(self.id, random_id)
                else:
                    s.send("Hello World!")                    
                finally:
                    s.close()

            time.sleep(1)

if __name__=="__main__":
    print "hostname:", socket.getfqdn()
    print dir(multiprocessing)
    manager = multiprocessing.Manager()
    processes = manager.dict()
    joinable = []
    for n in xrange(multiprocessing.cpu_count()):
        mpp = MessagePassingProcess(n, processes)
        processes[n] = {"id":n,
                        "address":("127.0.0.1",7000+n),
                        "listening":False,
                        }
        print "processes[%s] = %s"%(n, processes[n])
        mpp.start()
        joinable.append(mpp)
    for process in joinable:
        process.join()

通过大量的修饰和测试爱,这可能是multiprocessing.Process和/或multiprocessing.Pool的逻辑扩展,因为这似乎是人们在标准库中可用时会使用的东西。创建一个使用套接字可以被其他队列发现的DynamicQueue类也是合理的。

无论如何,希望它有所帮助。如果你想出一个更好的方法来做这项工作,请更新。

答案 1 :(得分:3)

此代码基于已接受的答案。它是在Python 3中,因为OSX Snow Leopard会对多处理内容的某些用途进行分段。

#!/usr/bin/env python3

import collections
from multiprocessing import Process, Manager, Lock, cpu_count
import random
import select
import socket
import time
import pickle

class Message:
    def __init__(self, origin):
        self.type = "long_msg"
        self.data = "X" * 3000
        self.origin = origin
    def __str__(self):
        return "%s %d" % (self.type, self.origin)

class MessagePassingProcess(Process):
    def __init__(self, processes, lock):
        self.lock = lock
        self.processes = processes
        with self.lock:
            self.id = len(list(processes.keys()))
            process_dict = {"id": self.id,
                            "address": ("127.0.0.1", 7000 + self.id),
                            "listening": False
                            }
            self.processes[self.id] = process_dict
        print("new process: processes[%s] = %s" % (self.id, processes[self.id]))
        self.queue = collections.deque()
        super(MessagePassingProcess, self).__init__()

    def run(self):
        print("Running")
        self.processes[self.id]["joinable"] = True
        inputs = []
        outputs = []
        server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        address = self.processes[self.id]["address"]
        print("Process %s binding to %s" % (self.id, address))
        server.bind(address)
        server.listen(5)
        inputs.append(server)
        process = self.processes[self.id]
        process["listening"] = True
        self.processes[self.id] = process
        print("Process %s now listening!(%s)" % (self.id, process))
        while inputs and len(list(self.processes.keys())) < 10:
            readable, writable, exceptional = select.select(inputs,
                                                           outputs,
                                                           inputs,
                                                           0.1)
            # read incoming messages
            for sock in readable:
                print("Process %s has a readable socket: %s" % (self.id, sock))
                if sock is server:
                    print("Process %s has a readable server socket: %s" %
                          (self.id, sock))
                    conn, addr = sock.accept()
                    conn.setblocking(0)
                    inputs.append(conn)
                else:
                    data = True
                    item = bytes() # empty bytes object, to be added to
                    recvs = 0
                    while data:
                        data = sock.recv(1024)
                        item += data
                        recvs += 1
                    if len(item):
                        self.queue.append(item)
                        print("non server readable socket: recvd %d bytes in %d parts"
                              % (len(item), recvs))
                    else:
                        inputs.remove(sock)
                        sock.close()
                        print("non server readable socket: nothing to read")

            for sock in exceptional:
                print("exception occured on socket %s" % (sock))
                inputs.remove(sock)
                sock.close()

            while len(self.queue):
                msg = pickle.loads(self.queue.pop())
                print("received:" + str(msg))

            # send a message to a random process:
            random_id = random.choice(list(self.processes.keys()))
            print("%s attempting to send message to %s" % (self.id, random_id))
            random_process = self.processes[random_id]
            if random_process["listening"]:
                random_address = random_process["address"]
                s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
                try:
                    s.connect(random_address)
                except socket.error:
                    print("%s failed to send to %s"%(self.id, random_id))
                else:
                    item = pickle.dumps(Message(self.id))
                    print("sending a total of %d bytes" % len(item))
                    s.sendall(item)
                finally:
                    s.close()

            # make a new process
            if random.random() < 0.1:
                mpp = MessagePassingProcess(self.processes, self.lock)
                mpp.start()
            else:
                time.sleep(1.0)
        print("process %d finished looping" % self.id)


if __name__=="__main__":
    manager = Manager()
    processes = manager.dict()
    lock = Lock()
    # make just one process: it will make more
    mpp = MessagePassingProcess(processes, lock)
    mpp.start()
    # this doesn't join on all the other processes created
    # subsequently
    mpp.join()

答案 2 :(得分:1)

提供标准库socketserver有助于避免手动编程select()。在这个版本中,我们在一个单独的线程中启动一个socketserver,这样每个进程都可以在其主循环中做(好吧,假装做)计算。

#!/usr/bin/env python3

# Each Node is an mp.Process. It opens a client-side socket to send a
# message to another Node. Each Node listens using a separate thread
# running a socketserver (so avoiding manual programming of select()),
# which itself starts a new thread to handle each incoming connection.
# The socketserver puts received messages on an mp.Queue, where they
# are picked up by the Node for processing once per loop. This setup
# allows the Node to do computation in its main loop.

import multiprocessing as mp
import threading, random, socket, socketserver, time, pickle, queue

class Message:
    def __init__(self, origin):
        self.type = "long_message"
        self.data = "X" * random.randint(0, 2000)
        self.origin = origin
    def __str__(self):
        return "Message of type %s, length %d from %d" % (
            self.type, len(self.data), self.origin)

class Node(mp.Process):
    def __init__(self, nodes, lock):
        super().__init__()

        # Add this node to the Manager.dict of node descriptors.
        # Write-access is protected by a Lock.
        self.nodes = nodes
        self.lock = lock
        with self.lock:
            self.id = len(list(nodes.keys()))
            host = "127.0.0.1"
            port = 7022 + self.id
            node = {"id": self.id, "address": (host, port), "listening": False}
            self.nodes[self.id] = node
        print("new node: nodes[%s] = %s" % (self.id, nodes[self.id]))

        # Set up socketserver.

        # don't know why collections.deque or queue.Queue don't work here.
        self.queue = mp.Queue()

        # This MixIn usage is directly from the python.org
        # socketserver docs
        class ThreadedTCPServer(socketserver.ThreadingMixIn,
                                socketserver.TCPServer):
            pass
        class HandlerWithQueue(socketserver.BaseRequestHandler):
            # Something of a hack: using class variables to give the
            # Handler access to this Node-specific data
            handler_queue = self.queue
            handler_id = self.id
            def handle(self):
                # could receive data in multiple chunks, so loop and
                # concatenate
                item = bytes()
                recvs = 0
                data = True
                if data:
                    data = self.request.recv(4096)
                    item += data
                    recvs += 1
                if len(item):
                    # Receive a pickle here and put it straight on
                    # queue. Will be unpickled when taken off queue.
                    print("%d: socketserver received %d bytes in %d recv()s"
                          % (self.handler_id, len(item), recvs))
                    self.handler_queue.put(item)

        self.server = ThreadedTCPServer((host, port), HandlerWithQueue)
        self.server_thread = threading.Thread(target=self.server.serve_forever)
        self.server_thread.setDaemon(True) # Tell it to exit when Node exits.
        self.server_thread.start()
        print("%d: server loop running in thread %s" %
              (self.id, self.server_thread.getName()))

        # Now ready to receive
        with self.lock:
            # Careful: if we assign directly to
            # self.nodes[self.id]["listening"], the new value *won't*
            # be propagated to other Nodes by the Manager.dict. Have
            # to use this hack to re-assign the Manager.dict key.
            node = self.nodes[self.id]
            node["listening"] = True
            self.nodes[self.id] = node

    def send(self):
        # Find a destination. All listening nodes are eligible except self.
        dests = [node for node in self.nodes.values()
                 if node["id"] != self.id and node["listening"]]
        if len(dests) < 1:
            print("%d: no node to send to" % self.id)
            return
        dest = random.choice(dests)
        print("%d: sending to %s" % (self.id, dest["id"]))

        # send
        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        try:
            s.connect(dest["address"])
        except socket.error:
            print("%s: failed to send to %s" % (self.id, dest["id"]))
        else:
            item = pickle.dumps(Message(self.id))
            s.sendall(item)
        finally:
            s.close()

    # Check our queue for incoming messages.
    def receive(self):
        while True:
            try:
                message = pickle.loads(self.queue.get(False))
                print("%d: received %s" % (self.id, str(message)))
            except queue.Empty:
                break

    def run(self):
        print("%d: in run()" % self.id)
        # Main loop. Loop until at least 10 Nodes exist. Because of
        # parallel processing we might get a few more
        while len(list(self.nodes.keys())) < 10:
            time.sleep(random.random() * 0.5) # simulate heavy computation
            self.send()
            time.sleep(random.random() * 0.5) # simulate heavy computation
            self.receive()
            # maybe make a new node
            if random.random() < 0.1:
                new = Node(self.nodes, self.lock)
                new.start()
        # Seems natural to call server_thread.shutdown() here, but it
        # hangs. But since we've set the thread to be a daemon, it
        # will exit when this process does.
        print("%d: finished" % self.id)

if __name__=="__main__":
    manager = mp.Manager()
    nodes = manager.dict()
    lock = mp.Lock()
    # make just one node: it will make more
    node0 = Node(nodes, lock)
    node0.start()
    # This doesn't join on all the other nodes created subsequently.
    # But everything seems to work out ok.
    node0.join()