Question

当我在本地运行代码并尝试从URL获取数据然后将其解析为文本时，一切正常。

当我在远程服务器上运行完全相同的代码并尝试从URL错误HTTP Error 403: Forbidden中获取数据时

问题答案： HTTP error 403 in Python 3 Web Scraping，当我尝试在本地运行它并且一切正常时，urllib2.HTTPError: HTTP Error 403: Forbidden帮助了我。

您是否知道在代码相同时（本地和服务器上）从远程服务器获取数据有什么不同？运行代码的方式是相同的但结果是完全不同的？

我想要提取的网址： url=https://bithumb.cafe/notice

我试图用来获取数据的代码（一旦工作，第二次没有）

try:
    request = urllib.request.Request(url)

    request.add_header('User-Agent', 'cheese')
    logger.info("request: {}".format(request))

    content = urllib.request.urlopen(request).read()
    logger.info('content: {}'.format(content))

    decoded = content.decode('utf-8')
    logger.info('content_decoded: {}'.format(decoded))

    return decoded
except Exception as e:
    logger.error('failed with error message: {}'.format(e))
    return ''`

获取数据的第二种方式（也可以在本地工作但不在远程服务器上工作）：

class AppURLopener(urllib.request.FancyURLopener):
    version = "Mozilla/5.0"

方法：

try:
    opener = AppURLopener()
    response = opener.open(url)
    logger.info("request response: {}. response type: {}. response_dict: {}"
                .format(response, type(response), response.__dict__))
    html_response = response.read()
    logger.info("html_Response".format(html_response))
    encoding = response.headers.get_content_charset('utf-8')
    decoded_html = html_response.decode(encoding)
    logger.info('content_decoded: {}'.format(decoded_html))
    return decoded_html
except Exception as e:
    logger.error('failed with error message: {}'.format(e))
    return ''

HTTP错误403：在服务器上获取html源时禁止访问

0 个答案: