为什么我无法从使用BeautifulSoup的网站获得缺乏数据?我收到超时错误

时间:2018-11-06 14:24:46

标签: python beautifulsoup python-requests urllib2 urlopen

我正试图从以下网站获取数据,但出现以下错误。 PFB的代码相同。

from urllib2 import urlopen
import bs4 as bs
response = urlopen('http://www.mec.ac.in/mec/stats2018.php')
html = response.read()
soup = bs.BeautifulSoup(response,'lxml')
print soup.title

PFB错误:

Traceback (most recent call last):
  File "et.py", line 3, in <module>
    response = urlopen('http://www.mec.ac.in/mec/stats2018.php')
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 435, in open
    response = meth(req, response)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 548, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 473, in error
    return self._call_chain(*args)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 556, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden

恢复此错误后如何检索数据?

1 个答案:

答案 0 :(得分:2)

服务器专门使用包含<script src="<?php echo esc_url( get_template_directory_uri() ); ?>/scripts/highcharts.js"></script> 字符串的User-Agent头(默认情况下Python-urllib / urllib2发送的)来“阻止”请求:

urllib