使用tornado的此脚本在某些实时主机上获取http响应代码500。请不要介意循环。它是由于我的代码块过于简单化了。我尝试过使用ip但无济于事。
#!/usr/bin/python
import tornado
from tornado import httpclient
from tornado import gen
from tornado.httpclient import AsyncHTTPClient
gloop = tornado.ioloop.IOLoop.instance()
@gen.engine
def process(url):
print url
try:
http_client = httpclient.AsyncHTTPClient()
request = tornado.httpclient.HTTPRequest(url=str(url), connect_timeout=5.0, validate_cert = False, request_timeout=5.0, follow_redirects=True)
response = yield tornado.gen.Task(http_client.fetch, request)
print url
print response.code
if response.error: raise Exception(response.error)
except Exception as e:
print e
gloop.add_callback(process, 'http://www.dhlsameday.com')
tornado.httpclient.AsyncHTTPClient.configure("tornado.curl_httpclient.CurlAsyncHTTPClient")
gloop.start()
答案 0 :(得分:0)
尽管不信任证书。起初我怀疑机器人是安全的,但是这个网站只是有错误处理问题,需要Accept-Language
。即使卷曲也失败了。要开始工作,只需传递标题:
headers = {"Accept-Language": "en-US;q=0.7,en;q=0.3"}
request = tornado.httpclient.HTTPRequest(url=str(url), headers=headers, connect_timeout=5.0, validate_cert = False, request_timeout=5.0, follow_redirects=True)
我建议添加更多常见的浏览器标题