我使用用户代理来抓取安全网站,我仍然遇到此错误;引发HTTPError(req.full_url,code,msg,hdrs,fp)HTTPError:Forbidden

时间:2018-03-31 05:19:18

标签: python-3.x web-scraping beautifulsoup python-requests http-error

我正在将安全网站justdial.com的html内容抓取并解析为csv文件,因为我也使用了用户代理,但仍然得到此错误 - 引发HTTPError(req.full_url,code,msg,hdrs,fp) HTTPError:禁止。 我的代码是 -

import urllib.request
import urllib
from urllib.request import urlopen
import bs4
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) 
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}    

url = 'https://www.justdial.com/Mumbai/311/B2b_fil'
req = urllib.request.Request(url, None, headers)
response = urllib.request.urlopen(req)
print(response.read())

html= urllib.request.urlopen(url).read()
soup = BeautifulSoup(html)

打印响应后,我必须解析从html到csv文件的内容,但是它给出了这个错误

  File "<ipython-input-21-c589d79bf43d>", line 1, in <module>
runfile('C:/Users/justdial.py', wdir='C:/Users')

  File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 710, in runfile
execfile(filename, namespace)

  File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/justdial.py", line 21, in <module>
html= urllib.request.urlopen(url).read()

  File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 223, in urlopen
return opener.open(url, data, timeout)

  File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 532, in open
response = meth(req, response)

  File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)

  File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 570, in error
return self._call_chain(*args)

  File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 504, in _call_chain
result = func(*args)

  File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)

HTTPError: Forbidden

0 个答案:

没有答案