Question

我正在尝试打开并解析Python 3.5中的以下URL，以收集我的作业的一些注释。这是我的代码：

 Traceback (most recent call last):
      File "/Users/maryamzolnoori/Dropbox/Dissertation/Programming/Web-Crawl/Askapatient_collect_comments.py", line 12, in <module>
        home_page = urlopen(req).read()
      File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 163, in urlopen
        return opener.open(url, data, timeout)
      File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 472, in open
        response = meth(req, response)
      File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 582, in http_response
        'http', request, response, code, msg, hdrs)
      File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 510, in error
        return self._call_chain(*args)
      File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 444, in _call_chain
        result = func(*args)
      File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 590, in http_error_default
        raise HTTPError(req.full_url, code, msg, hdrs, fp)
    urllib.error.HTTPError: HTTP Error 403: Forbidden

这就是错误：

urllib2.HTTPError: HTTP Error 416: Requested Range Not Satisfiable

我甚至在python 2.7中测试过它并且失败了。错误是：

{{1}}

Answer 1

你得到403被禁止，很可能是由于用户代理是python。尝试设置用户代理，就像您是浏览器一样。

例如：

from urllib.request import Request, urlopen
url = "http://www.webmd.com/drugs/drugreview-35-Zoloft+oral.aspx?drugid=35&drugname=Zoloft+oral&conditionFilter=-500"
req = Request(
    url, 
    data=None, 
    headers={
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
    }
)

home_page = urlopen(req)
print(home_page.read().decode('utf-8'))

使用适当的编码也是一个好主意。

Python 3.5无法打开url-错误（http 403）

1 个答案: