Python 3.5 urllib.request 403 Forbidden Error

时间:2016-12-19 02:20:59

标签: python-3.x beautifulsoup urllib http-status-code-403

import urllib.request
import urllib
from bs4 import BeautifulSoup


url = "https://www.brightscope.com/ratings"
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, "html.parser")

print(soup.title)

我试图访问上面的网站,代码不断吐出403禁止错误。

任何想法?

  

C:\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35-32 \ python.exe" C:/ Users / jerem / PycharmProjects / webscraper / url scraper.py"   Traceback(最近一次调用最后一次):     文件" C:/ Users / jerem / PycharmProjects / webscraper / url scraper.py",第7行,in       page = urllib.request.urlopen(url)     文件" C:\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35-32 \ lib \ urllib \ request.py",第163行,在urlopen中       return opener.open(url,data,timeout)     文件" C:\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35-32 \ lib \ urllib \ request.py",第472行,打开       response = meth(req,response)     文件" C:\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35-32 \ lib \ urllib \ request.py",第582行,在http_response中       ' http',请求,响应,代码,消息,hdrs)     文件" C:\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35-32 \ lib \ urllib \ request.py",第510行,出错       return self._call_chain(* args)     文件" C:\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35-32 \ lib \ urllib \ request.py",第444行,_call_chain       result = func(* args)     文件" C:\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35-32 \ lib \ urllib \ request.py",第590行,http_error_default       引发HTTPError(req.full_url,code,msg,hdrs,fp)   urllib.error.HTTPError:HTTP错误403:禁止

1 个答案:

答案 0 :(得分:6)

import requests
from bs4 import BeautifulSoup


url = "https://www.brightscope.com/ratings"
headers = {'User-Agent':'Mozilla/5.0'}
page = requests.get(url)
soup = BeautifulSoup(page.text, "html.parser")

print(soup.title)

出:

<title>BrightScope Ratings</title>

首先,使用requests而不是urllib

然后,将headers添加到requests,否则,该网站将禁止您,因为默认的User-Agent是抓取工具,网站不喜欢。