我想将www.nasdaq.com/symbol/c/stock-report
下载为文件
方法1:
from urllib.request import urlopen
url=r'http://www.nasdaq.com/symbol/c/stock-report'
urlopen(url)
遇到错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\Python34\lib\urllib\request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "D:\Python34\lib\urllib\request.py", line 461, in open
response = meth(req, response)
File "D:\Python34\lib\urllib\request.py", line 571, in http_response
'http', request, response, code, msg, hdrs)
File "D:\Python34\lib\urllib\request.py", line 499, in error
return self._call_chain(*args)
File "D:\Python34\lib\urllib\request.py", line 433, in _call_chain
result = func(*args)
File "D:\Python34\lib\urllib\request.py", line 579, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
方法2:
wget -c 'http://www.nasdaq.com/symbol/c/stock-report'
'http://www.nasdaq.com/symbol/c/stock-report': Unsupported scheme.
如何通过程序自动关闭它?
答案 0 :(得分:0)
我尝试requests
并且它有效 - 我获得了一些HTML数据和状态代码200。
import requests
r = requests.get('http://www.nasdaq.com/symbol/c/stock-report')
print r.status_code
print r.text
但HTTP Error 403: Forbidden
可能意味着服务器将您视为机器人/黑客/恐怖分子等
并且不允许您访问其页面。
您的脚本必须使用浏览器 - 用户 - 时间,Cookie,会话ID,反应时间等行为。