更新

Question

我正在开发一个PyQt程序，它基本上从互联网上收集数据。在这个例子中，我试图从RSS网页获取数据。

让我们假设self.feed是包含所有文章的RSS页面，让我们假设“entry”是一篇文章。 “entry.url”是网站上文章的原始页面。

from requests_futures.sessions import FuturesSession

self.session_pages = FuturesSession(max_workers=20)
for entry in self.feed.entries:
    future = self.session_pages.get(entry.url, timeout=10)
    future.add_done_callback(my_call_back)

这基本上就是我的表现。它嵌入在PyQt线程中，我同时运行多个线程，但我认为问题不是来自PyQt。

我的问题是我认为期货不会关闭连接，即使它们已经完成。我这样检查一下：

lsof -i | grep "python" | wc -l

losf -i提供连接中涉及的打开文件。该命令的其余部分是计算打开的文件数。这个数字不会停止增长（类似于900），然后我收到以下错误：

(python:28285): GLib-ERROR **: Creating pipes for GWakeup: Too many open files
[1]    28285 trace trap (core dumped)  python gui.py

我认为这个问题来自期货，但实际上我并不确定。

我尝试过类似的事情：

self.session_pages.shutdown()

在帖子的末尾，但它没有用。

你有什么想法吗？

Answer 1

我在python的FutureSession中看不到concurrent.futures？我在这里做一些假设。

除非回调对每个self.session_page.get(...)都是唯一的，我认为future.add_done_callback(my_call_back)行可能正在创建新的，并覆盖回调的对象ID，或者可能不正确？

这是the only place，我可以在您使用的内容中找到对FutureSession的引用：

from pprint import pprint
from requests_futures.sessions import FuturesSession

session = FuturesSession()

def bg_cb(sess, resp):
    # parse the json storing the result on the response object
    resp.data = resp.json()

future = session.get('http://httpbin.org/get', background_callback=bg_cb)
# do some other stuff, send some more requests while this one works
response = future.result()
print('response status {0}'.format(response.status_code))
# data will have been attached to the response object in the background
pprint(response.data)

尝试设置background_callback

更新

被覆盖/包裹的唯一Method为from requests import Session inherited here而concurrent.futures没有获取或请求。

我会尝试使用self.session_pages.request而不是self.session_pages.get，因为FutureSession由线程池执行程序和requests.Sessions

组成

是的情况是这样的：

(Pdb) inspect.getmro(FuturesSession)
(<class '__main__.FuturesSession'>, <class 'requests.sessions.Session'>, <class 'requests.sessions.SessionRedirectMixin'>, <class 'object'>)
(Pdb) vars()
{'DEFAULT_POOLSIZE': 10, '__return__': None, '__spec__': None, 'inspect': <module 'inspect' from '/usr/lib/python3.4/inspect.py'>, '__file__': 'requestsfutures.py', 'FuturesSession': <class '__main__.FuturesSession'>, 'HTTPAdapter': <class 'requests.adapters.HTTPAdapter'>, 'ThreadPoolExecutor': <class 'concurrent.futures.thread.ThreadPoolExecutor'>, 'Session': <class 'requests.sessions.Session'>, '__name__': '__main__', '__cached__': None, '__doc__': "\nrequests_futures\n~~~~~~~~~~~~~~~~\n\nThis module provides a small add-on for the requests http library. It makes use\nof python 3.3's concurrent.futures or the futures backport for previous\nreleases of python.\n\n    from requests_futures import FuturesSession\n\n    session = FuturesSession()\n    # request is run in the background\n    future = session.get('http://httpbin.org/get')\n    # ... do other stuff ...\n    # wait for the request to complete, if it hasn't already\n    response = future.result()\n    print('response status: {0}'.format(response.status_code))\n    print(response.content)\n\n", 'pdb': <module 'pdb' from '/usr/lib/python3.4/pdb.py'>, '__loader__': <_frozen_importlib.SourceFileLoader object at 0x7f6d84194470>, '__builtins__': <module 'builtins' (built-in)>, '__package__': None}
(Pdb) vars().keys()
dict_keys(['DEFAULT_POOLSIZE', '__return__', '__spec__', 'inspect', '__file__', 'FuturesSession', 'HTTPAdapter', 'ThreadPoolExecutor', 'Session', '__name__', '__cached__', '__doc__', 'pdb', '__loader__', '__builtins__', '__package__'])
(Pdb) vars()['FuturesSession']
<class '__main__.FuturesSession'>
(Pdb) vars()['FuturesSession'].get
<function Session.get at 0x7f6d80c07488>
(Pdb) vars()['Session'].get
<function Session.get at 0x7f6d80c07488>

Answer 2

好的，你是@jm_____。 get（）调用只是对requests.get的调用。所以我用这里的答案：

Python-Requests close http connection

更具体地说：

future = self.session_pages.get(url, timeout=20, headers={'Connection':'close'})

现在lsof表示正常数字。谢谢。

FutureSession：套接字连接未关闭

2 个答案:

更新