我正在尝试打开并解析Python 3.5中的以下URL,以收集我的作业的一些注释。这是我的代码:
Traceback (most recent call last):
File "/Users/maryamzolnoori/Dropbox/Dissertation/Programming/Web-Crawl/Askapatient_collect_comments.py", line 12, in <module>
home_page = urlopen(req).read()
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 163, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 472, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 582, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 510, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 444, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 590, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
这就是错误:
urllib2.HTTPError: HTTP Error 416: Requested Range Not Satisfiable
我甚至在python 2.7中测试过它并且失败了。错误是:
{{1}}
答案 0 :(得分:1)
你得到403被禁止,很可能是由于用户代理是python。尝试设置用户代理,就像您是浏览器一样。
例如:
from urllib.request import Request, urlopen
url = "http://www.webmd.com/drugs/drugreview-35-Zoloft+oral.aspx?drugid=35&drugname=Zoloft+oral&conditionFilter=-500"
req = Request(
url,
data=None,
headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
}
)
home_page = urlopen(req)
print(home_page.read().decode('utf-8'))
使用适当的编码也是一个好主意。