遇到 - "引发HTTPError(req.get_full_url(),代码,msg,hdrs,fp)urllib2.HTTPError:HTTP错误403:禁止"

时间:2017-07-06 09:11:29

标签: python-2.7 web-scraping beautifulsoup urllib2

import urllib2

import BeautifulSoup

request = urllib2.Request("https://adexchanger.com/searchresults/?q=digital%20marketing")

response = urllib2.urlopen(request)

soup = BeautifulSoup.BeautifulSoup(response)

for a in soup.findAll('a'):

  if 'digital marketing' in a['href']:

    print a

1 个答案:

答案 0 :(得分:0)

该网站显然会阻止机器人和漫游器,因此您必须添加chrome / Mozzila标头才能像浏览器一样运行。请尝试下面的代码。

>>> headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
>>> req = urllib2.Request('https://adexchanger.com/searchresults/?q=digital%20marketing', None, headers)
>>> urllib2.urlopen(req)
<addinfourl at 140245639765816 whose fp = <socket._fileobject object at 0x7f8d7b865250>>