发出HTTP请求时,Python子进程会以静默方式崩溃

时间:2015-06-10 20:13:54

标签: python python-2.7 python-requests nltk python-multiprocessing

我在组合多处理,请求(或urllib2)和nltk时遇到了问题。这是一个非常简单的代码:

>>> from multiprocessing import Process
>>> import requests
>>> from pprint import pprint
>>> Process(target=lambda: pprint(
        requests.get('https://api.github.com'))).start()
>>> <Response [200]>  # this is the response displayed by the call to `pprint`.

关于这段代码的更多细节:

  1. 导入一些必需的模块
  2. 启动子流程
  3. 从子进程
  4. 向'api.github.com'发出HTTP GET请求
  5. 显示结果
  6. 这很有效。导入nltk时出现问题:

    >>> import nltk
    >>> Process(target=lambda: pprint(
            requests.get('https://api.github.com'))).start()
    >>> # nothing happens!
    

    导入NLTK后,请求实际上会以静默方式崩溃线程(如果尝试使用命名函数而不是lambda函数,在调用之前和之后添加一些print语句,您将看到执行在requests.get的调用中停止 有没有人知道NLTK可以解释这种行为,以及如何克服这个问题?

    以下是我正在使用的版本:

    $> python --version
    Python 2.7.5
    $> pip freeze | grep nltk
    nltk==2.0.5
    $> pip freeze | grep requests
    requests==2.2.1
    

    我正在运行Mac OS X v.10.9.5。

    谢谢!

3 个答案:

答案 0 :(得分:1)

似乎在子进程中使用Nltk和Python请求很少见。尝试使用Thread而不是Process,我遇到了与其他库和Requests完全相同的问题,并替换了Process with Thread为我工作。

答案 1 :(得分:1)

更新python库和python应解决问题:

alvas@ubi:~$ pip freeze | grep nltk
nltk==3.0.3
alvas@ubi:~$ pip freeze | grep requests
requests==2.7.0
alvas@ubi:~$ python --version
Python 2.7.6
alvas@ubi:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.2 LTS
Release:    14.04
Codename:   trusty

来自代码:

from multiprocessing import Process
import nltk
import time


def child_fn():
    print "Fetch URL"
    import urllib2
    print urllib2.urlopen("https://www.google.com").read()[:100]
    print "Done"


while True:
    child_process = Process(target=child_fn)
    child_process.start()
    child_process.join()
    print "Child process returned"
    time.sleep(1)

[OUT]:

Fetch URL
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="de"><head><meta content
Done
Child process returned
Fetch URL
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="de"><head><meta content
Done
Child process returned
Fetch URL
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="de"><head><meta content
Done
Child process returned

来自代码:

alvas@ubi:~$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from multiprocessing import Process
>>> import requests
>>> from pprint import pprint
>>> Process(target=lambda: pprint(
...         requests.get('https://api.github.com'))).start()
>>> <Response [200]>

>>> import nltk
>>> Process(target=lambda: pprint(
...         requests.get('https://api.github.com'))).start()
>>> <Response [200]>

它也适用于python3

alvas@ubi:~$ python3
Python 3.4.0 (default, Jun 19 2015, 14:20:21) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from multiprocessing import Process
>>> import requests
>>> Process(target=lambda: print(requests.get('https://api.github.com'))).start()
>>> 
>>> <Response [200]>

>>> import nltk
>>> Process(target=lambda: print(requests.get('https://api.github.com'))).start()
>>> <Response [200]>

答案 2 :(得分:0)

此问题似乎仍未解决。 https://github.com/nltk/nltk/issues/947 我认为这是一个严重的问题(除非你正在玩NLTK,做POC并尝试模型,而不是真正的应用程序) 我在RQ worker(http://python-rq.org/

中运行NLP管道
nltk==3.2.1
requests==2.9.1