因此,基本上,我正在尝试从网站上抓取JavaScript生成的数据。为此,我正在使用Python库requests_html。
这是我的代码:
from requests_html import HTMLSession
session = HTMLSession()
url = 'https://myurl'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
payload = {'mylog': 'root', 'mypass': 'root'}
r = session.post(url, headers=headers, verify=False, data=payload)
r.html.render()
load = r.html.find('#load_span', first=True)
print (load.text)
如果我不使用render()函数,则可以连接到网站,并且我抓取的数据为空(这是正常的),但是当我使用它时,会出现此错误:
pyppeteer.errors.PageError: net::ERR_CERT_COMMON_NAME_INVALID at https://myurl
我假设render。会忽略session.post的参数“ verify = False”。我该怎么办?
编辑:如果要重现错误:
from requests_html import HTMLSession
import requests
session = HTMLSession()
url = 'https://wrong.host.badssl.com'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
r = session.post(url, headers=headers, verify=False)
r.html.render()
load = r.html.find('#content', first=True)
print (load)