Question

我在多处理方面完全是新的。我正在尝试更改我的代码，以便同时运行它的一部分。

我有一个庞大的列表，我必须为每个节点调用一个API。由于API是独立的，因此我不需要第一个的结果才能进入第二个。所以，我有这个代码：

def xmlpart1(id):
    ..call the api..
    ..retrieve the xml..
    ..find the part of xml I want..
    return xml_part1

def xmlpart2(id):
    ..call the api..
    ..retrieve the xml..
    ..find the part of xml I want..
    return xml_part2

def main(index):
    mylist = [[..,..],[..,..],[..,..],[..,...]] # A huge list of lists with ids I need for calling the APIs
    myL= mylist[index] c
    mydic = {}
    for i in myL: 
       flag1 = xmlpart1(i)
       flag2 = xmlpart2(i)
       mydic[flag1] = flag2

   root = "myfilename %s.json" %(str(index))

   with open(root, "wb") as f:
        json.dump(mydic,f)

from multiprocessing import Pool

if __name__=='__main__':
    Pool().map(main, [0,1,2,3])

从这里和聊天中提出一些建议后，我最终得到了这段代码。问题仍然存在。我在9:50运行脚本。在10:25第一个文件“myfilename 0.json”出现在我的文件夹中。现在是11:25，其他文件都没有出现。子列表具有相同的长度并且它们执行相同的操作，因此它们需要大致相同的时间。

Answer 1

这更适合multiprocessing.Pool()类。

这是一个简单的例子：

from multiprocessing import Pool

def job(args):
    """Your job function"""


Pool().map(job, inputs)

其中：

inputs是您的输入列表。每个输入都传递给作业并在单独的过程中处理。

当所有工作完成后，您将结果作为列表返回。

multiprocessing.Pool().map就像Python内置map()一样，但为您设置了一个工作流池，并将每个输入传递给给定的函数。

有关详细信息，请参阅文档：http://docs.python.org/2/library/multiprocessing.html

Python和多处理示例

1 个答案: