我正在尝试通过Python中的request_html从url获取html。但是我只能从任何一个URL中获取错误412,是谁熟悉此pip插件,您能帮我分析这个问题吗?
我在此处粘贴代码:
from requests_html import HTMLSession
urls = ['http://www.cdtf.gov.cn/cdtf/c130542/2019-12/27/content_df16d2c291984ad6923f624ef6de4fbe.shtml',
'http://www.cdtf.gov.cn/cdtf/c130542/2019-12/25/content_86cb4d2a82d74814af08e25444cdf2ca.shtml',
'http://www.cdtf.gov.cn/cdtf/c130542/2019-12/06/content_4dedabd7d30b41ae8d5fa7a484749bd5.shtml',
'http://www.cdtf.gov.cn/cdtf/c130542/2019-12/06/content_2c959180b33f4ab0aa32602908dd816a.shtml']
header = {
'Accept': 'image/webp,*/*',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-US,en;q=0.7,zh-CN;q=0.3',
'Connection': 'keep-alive',
'Cookie':
'''azSsQE5NvspcS=5tsIGOySQFij6ZSSdFddJGIJ7ogz9h64pMGRtQMJ0zpKwtqYGN4f9ZLb2qTna9OQbHdxIH_6NyFFXWjZd0leedA; Path=/; expires=Thu, 23 May 2030 09:13:13 GMT; HttpOnly''',
'Host': 'www.cdtf.gov.cn',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.96 Safari/537.36',
}
def main():
session = HTMLSession()
for url in urls:
print('Url: %s' % url)
response = session.get(url, headers=header)
print('Status code: %s' % response.status_code)
if response.status_code == 200:
response.html.render()
elements = response.html.find()
if elements:
for element in elements:
print(element.html)
print('*** DONE ***')
if __name__ == "__main__":
main()
我错过了什么吗?任何帮助将不胜感激!