Python通过request_html获取内容,但遇到错误412

时间:2020-05-25 09:56:46

标签: python web-scraping python-requests-html

我正在尝试通过Python中的request_html从url获取html。但是我只能从任何一个URL中获取错误412,是谁熟悉此pip插件,您能帮我分析这个问题吗?

我在此处粘贴代码:

from requests_html import HTMLSession

urls = ['http://www.cdtf.gov.cn/cdtf/c130542/2019-12/27/content_df16d2c291984ad6923f624ef6de4fbe.shtml',
        'http://www.cdtf.gov.cn/cdtf/c130542/2019-12/25/content_86cb4d2a82d74814af08e25444cdf2ca.shtml',
        'http://www.cdtf.gov.cn/cdtf/c130542/2019-12/06/content_4dedabd7d30b41ae8d5fa7a484749bd5.shtml',
        'http://www.cdtf.gov.cn/cdtf/c130542/2019-12/06/content_2c959180b33f4ab0aa32602908dd816a.shtml']

header = {
    'Accept': 'image/webp,*/*',
    'Accept-Encoding': 'gzip, deflate',
    'Accept-Language': 'en-US,en;q=0.7,zh-CN;q=0.3',
    'Connection': 'keep-alive',
    'Cookie':
    '''azSsQE5NvspcS=5tsIGOySQFij6ZSSdFddJGIJ7ogz9h64pMGRtQMJ0zpKwtqYGN4f9ZLb2qTna9OQbHdxIH_6NyFFXWjZd0leedA; Path=/; expires=Thu, 23 May 2030 09:13:13 GMT; HttpOnly''',
    'Host': 'www.cdtf.gov.cn',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.96 Safari/537.36',
}


def main():
    session = HTMLSession()

    for url in urls:
        print('Url: %s' % url)
        response = session.get(url, headers=header)

        print('Status code: %s' % response.status_code)

        if response.status_code == 200:
            response.html.render()

            elements = response.html.find()

            if elements:
                for element in elements:
                    print(element.html)

    print('*** DONE ***')


if __name__ == "__main__":
    main()

然后我从浏览器获得标题: Headers screenshot

我错过了什么吗?任何帮助将不胜感激!

0 个答案:

没有答案