首先,我会说我对任何语言都不是很了解。我在大学里上过几节课,从未真正跟进过。所以忍受我。
我在Windows 7上使用python 2.7。
所以我试图写一些基本的东西来刮取,解析和分析来自特定网站的数据。我意识到我应该使用请求,BeautifulSoup和lxml。
这是一个安全的网站。该网站使用TLS 1.0。使用带有SHA 1的AES_256_CBC加密连接以进行消息验证,并使用RSA作为密钥交换机制。这些对我来说都没有任何意义。这有什么令人望而却步的?
给我提问的代码:
from BeautifulSoup import BeautifulSoup
import requests
results = request.get(url)
这给出的跟踪如下:
Traceback (most recent call last):
File "C:/Users/User/PycharmProjects/untitled/jkh.py", line 6, in <module>
results = requests.get(url)
File "C:\Python27\lib\site-packages\requests\api.py", line 69, in get
return request('get', url, params=params, **kwargs)
File "C:\Python27\lib\site-packages\requests\api.py", line 50, in request
response = session.request(method=method, url=url, **kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 465, in request
resp = self.send(prep, **send_kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 573, in send
r = adapter.send(request, **kwargs)
File "C:\Python27\lib\site-packages\requests\adapters.py", line 431, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: EOF occurred in violation of protocol (_ssl.c:590)
我尝试添加一个不起作用的verify = false参数。这导致了相同的痕迹。
当我尝试使用openSSL连接到网站并按照here所述强制执行tls1时,我得到:
Verify return code: 20 (unable to get local issuer certificate)
在给我之前暂停了几分钟:
read:errno 10054
error in s_client
此代码有效,但我并不完全清楚Sessions是什么以及我是如何从这里前进的。感谢stackflow用户jasonamyers
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.poolmanager import PoolManager
import ssl
class MyAdapter(HTTPAdapter):
def init_poolmanager(self, connections, maxsize, block=False):
self.poolmanager = PoolManager(num_pools=connections,
maxsize=maxsize,
block=block,
ssl_version=ssl.PROTOCOL_TLSv1)
import requests
s = requests.Session()
s.mount('https://', MyAdapter())
帮助?
编辑:看起来我拥有了我需要的一切。此代码有效:from requests.adapters import HTTPAdapter
from requests.packages.urllib3.poolmanager import PoolManager
import ssl
class MyAdapter(HTTPAdapter):
def init_poolmanager(self, connections, maxsize, block=False):
self.poolmanager = PoolManager(num_pools=connections,
maxsize=maxsize,
block=block,
ssl_version=ssl.PROTOCOL_TLSv1)
import requests
s = requests.Session()
s.mount('url', MyAdapter())
results = s.get(url)