通过HTTPS

时间:2016-03-27 04:06:51

标签: python https gevent greenlets grequests

以下代码每200ms发送一次请求,并且应该在它们到来时异步处理响应。

通过HTTP,它按预期工作 - 每200ms发送一次请求,并在响应到达时独立调用响应回调。但是,通过HTTPS,只要响应到达,请求就会显着延迟(即使我的响应处理程序不起作用)。对于每个请求,响应回调似乎被调用两次,一次是零长度响应(编辑:这是因为重定向并且似乎与阻塞问题无关,感谢Padraic)。

通过HTTPS可能导致此阻止行为的原因是什么? (www.bbc.co.uk只是一个地理位置远离我的例子,但它发生在我测试过的所有服务器上。

grequests_test.py

import time
import sys
import grequests
import gevent

def cb(res, **kwargs):
    print("**** Response", time.time(), len(res.text))

for i in range(10):
    unsent = grequests.get(sys.argv[1], hooks={'response': cb})
    print("Request", time.time())
    grequests.send(unsent, grequests.Pool(1))
    gevent.sleep(0.2)
gevent.sleep(5)

$ ipython2 grequests_test.py 'http://www.bbc.co.uk' (预期结果)

('Request', 1459050191.499266)
('Request', 1459050191.701466)
('Request', 1459050191.903223)
('Request', 1459050192.10403)
('Request', 1459050192.305626)
('**** Response', 1459050192.099185, 179643)
('Request', 1459050192.506476)
('**** Response', 1459050192.307869, 179643)
('Request', 1459050192.707745)
('**** Response', 1459050192.484711, 179643)
('Request', 1459050192.909376)
('**** Response', 1459050192.696583, 179643)
('Request', 1459050193.110528)
('**** Response', 1459050192.870476, 179643)
('Request', 1459050193.311601)
('**** Response', 1459050193.071679, 179639)
('**** Response', 1459050193.313615, 179680)
('**** Response', 1459050193.4959, 179643)
('**** Response', 1459050193.687054, 179680)
('**** Response', 1459050193.902827, 179639)

ipython2 grequests_test.py 'https://www.bbc.co.uk' (请求延迟发送)

('Request', 1459050203.24336)
('Request', 1459050203.44473)
('**** Response', 1459050204.423302, 0)
('Request', 1459050204.424748) <------------- THIS REQUEST TIME IS LATE
('**** Response', 1459050205.294426, 0)
('Request', 1459050205.296722)
('Request', 1459050205.497924)
('**** Response', 1459050206.456572, 0)
('Request', 1459050206.457875)
('**** Response', 1459050207.363188, 0)
('**** Response', 1459050208.247189, 0)
('Request', 1459050208.249579)
('**** Response', 1459050208.250645, 179643)
('**** Response', 1459050208.253638, 179643)
('Request', 1459050208.451083)
('**** Response', 1459050209.426556, 0)
('Request', 1459050209.428032)
('**** Response', 1459050209.428929, 179643)
('**** Response', 1459050210.331425, 0)
('**** Response', 1459050211.247793, 0)
('Request', 1459050211.251574)
('**** Response', 1459050211.252321, 179643)
('**** Response', 1459050211.25519, 179680)
('**** Response', 1459050212.397186, 0)
('**** Response', 1459050213.299109, 0)
('**** Response', 1459050213.588854, 179588)
('**** Response', 1459050213.590434, 179643)
('**** Response', 1459050213.593731, 179643)
('**** Response', 1459050213.90507, 179643)
('**** Response', 1459050213.909386, 179643)

请注意,第一个响应似乎在下一个请求 已发送但很长时间后很久才到达。在第一个响应到来之前,为什么睡眠没有返回,下一个请求被发送?

2 个答案:

答案 0 :(得分:1)

可以轻松解释额外回复和0长度响应,如果添加print(res.status_code),您会看到很多301,就像https://www.bbc.co.uk的情况一样,您被重定向到{ {1}}这就是为什么你看到http://www.bbc.co.uk的额外回复和0,你可以看到下面的输出:

len(res.text)

如果我们使用通过https提供的网站运行相同的代码,则在此示例中为In [11]: def cb(res, **kwargs): ....: print(res.status_code) ....: print("**** Response", time.time(), len(res.text)) ....: In [12]: for i in range(10): ....: unsent = grequests.get("https://www.bbc.co.uk", hooks={'response': cb}) ....: print("Request", time.time()) ....: grequests.send(unsent, grequests.Pool(1)) ....: gevent.sleep(0.2) ....: gevent.sleep(5) ....: ('Request', 1459368704.32843) 301 ('**** Response', 1459368704.616453, 0) ('Request', 1459368704.618786) 301 ('**** Response', 1459368704.937159, 0) ('Request', 1459368704.941069) 200 ('**** Response', 1459368704.943034, 141486) 301 ('**** Response', 1459368705.496423, 0) ('Request', 1459368705.498991) 200 ('**** Response', 1459368705.50162, 141448) 301 ('**** Response', 1459368705.784145, 0) ('Request', 1459368705.785769) 200 ('**** Response', 1459368705.786772, 141486) 301 ('**** Response', 1459368706.110865, 0) ('Request', 1459368706.114921) 200 ('**** Response', 1459368706.116124, 141448) 301 ('**** Response', 1459368706.396807, 0) ('Request', 1459368706.400795) 200 301 ('**** Response', 1459368706.756861, 0) ('Request', 1459368706.76069) 200 ('**** Response', 1459368706.763268, 141448) ('**** Response', 1459368706.488708, 141448) 301 ('**** Response', 1459368707.065011, 0) ('Request', 1459368707.069128) 200 ('**** Response', 1459368707.071981, 141448) 301 ('**** Response', 1459368707.366737, 0) ('Request', 1459368707.370713) 200 ('**** Response', 1459368707.373597, 141448) 301 ('**** Response', 1459368707.73689, 0) 200 ('**** Response', 1459368707.743815, 141448) 200 ('**** Response', 1459368707.902499, 141448)

https://www.google.ie/

您会看到行为不同。我们得到10个回复且没有In [14]: for i in range(10): ....: unsent = grequests.get("https://www.google.ie/", hooks={'response': cb}) ....: print("Request", time.time()) ....: grequests.send(unsent, grequests.Pool(1)) ....: gevent.sleep(0.2) ....: gevent.sleep(5) ....: ('Request', 1459368895.525717) 200 ('**** Response', 1459368895.838453, 19682) ('Request', 1459368895.884151) 200 ('**** Response', 1459368896.168789, 19650) ('Request', 1459368896.22553) 200 ('**** Response', 1459368896.491304, 19632) ('Request', 1459368896.542206) 200 ('**** Response', 1459368896.808875, 19650) ('Request', 1459368896.850575) 200 ('**** Response', 1459368897.144725, 19705) ('Request', 1459368897.173744) 200 ('**** Response', 1459368897.45713, 19649) ('Request', 1459368897.491821) 200 ('**** Response', 1459368897.761675, 19657) ('Request', 1459368897.792373) 200 ('**** Response', 1459368898.331791, 19683) ('Request', 1459368898.350483) 200 ('**** Response', 1459368898.836108, 19713) ('Request', 1459368898.855729) 200 ('**** Response', 1459368899.148171, 19666) 长度响应。您应该检查函数中的0以验证您是否得到了您想要的内容。上面的示例解释了您对英国广播公司网站所看到的内容以及最有可能发生在其他网站上的内容。

答案 1 :(得分:1)

当前的grequest迭代包含以下内容:

from gevent import monkey as curious_george
curious_george.patch_all(thread=False, select=False)

违规部分为select=False - 删除此部分或手动调用monkey.patch_select()可解决问题。我不确定这是否有其他副作用。