Question

所以我有一个问题。我试图让我的导入更快，所以我开始使用多处理模块将一组导入分成两个函数，然后在单独的核心上运行每个函数，从而加快导入速度。但现在代码根本无法识别模块。我做错了什么？

import multiprocessing


def core1():
    import wikipedia
    import subprocess
    import random
    return wikipedia, subprocess, random



def core2():
    from urllib import request
    import json
    import webbrowser
    return request, json, webbrowser


if __name__ == "__main__":
    start_core_1 = multiprocessing.Process(name='worker 1', target=core1, args = core2())
    start_core_2 = multiprocessing.Process(name='worker 2', target=core2, args = core1())
    start_core_1.start()
    start_core_2.start()

while True:
    user = input('[!] ')
    with request.urlopen('https://api.wit.ai/message?v=20160511&q=%s&access_token=Z55PIVTSSFOETKSBPWMNPE6YL6HVK4YP' % request.quote(user)) as wit_api:  # call to wit.ai api
        wit_api_html = wit_api.read()
        wit_api_html = wit_api_html.decode()
        wit_api_data = json.loads(wit_api_html)
    intent = wit_api_data['entities']['Intent'][0]['value']
    term = wit_api_data['entities']['search_term'][0]['value']
    if intent == 'info_on':
        with request.urlopen('https://kgsearch.googleapis.com/v1/entities:search?query=%s&key=AIzaSyCvgNV4G7mbnu01xai0f0k9NL2ito8vY6s&limit=1&indent=True' % term.replace(' ', '%20')) as response:
            google_knowledge_base_html = response.read()
            google_knowledge_base_html = google_knowledge_base_html.decode()
            google_knowledge_base_data = json.loads(google_knowledge_base_html)
            print(google_knowledge_base_data['itemListElement'][0]['result']['detailedDescription']['articleBody'])
    else:
        print('Something')

Answer 1

我认为您错过了整个图片的重要部分，这是您在使用multiprocessing时需要了解的重要部分。

以下是您必须了解的一些关键部分，然后您将了解为什么您不能只在子进程中导入模块并加快速度。即使是返回加载的模块也不是一个完美的答案。

首先，当您使用multiprocess.Process时，子进程为forked（在Linux上）或spawned（在Windows上）。我假设您使用的是Linux。在这种情况下，每个子进程都从父进程（全局状态）继承每个已加载的模块。当子进程改变任何东西时，比如全局变量或导入新模块，那些只会在其上下文中。所以，父进程不知道它。我相信this的部分内容也很有意义。

其次，模块可以是一组类，外部lib绑定，函数等，其中一些很可能不能被腌制，至少使用pickle。以下是Python 2.7和Python 3.X中可以腌制的内容列表。甚至还有一些库可以为你提供更多的酸洗能力，比如dill。但是，我不确定腌制整个模块是一个好主意，更不用说你的进口很慢，但你想要序列化它们并将它们发送到父进程。即使你设法做到这一点，听起来也不是最好的方法。

关于如何改变观点的一些想法：

尝试修改您需要的模块以及原因？也许您可以使用其他可以提供类似功能的模块。也许这些模块超重并且带来了太多的成本，与你得到的相比，成本很高。
如果加载模块的速度很慢，请尝试创建一个始终运行的脚本，这样就不必多次运行它。
如果你真的需要这些模块，也许你可以在两个过程中分离它们的使用，然后每个过程都是自己的事情。例如，一个进程解析页面，其他进程进程等等。这样你加快了加载速度，但你必须处理进程之间传递消息。

使用python多处理在两个进程之间传输模块

1 个答案: