如何更改用户代理urllib2

时间:2018-02-21 21:05:40

标签: python python-2.7 user-agent

我正在尝试使用以下

访问页面
page = urllib2.urlopen(full_url)
soup = BeautifulSoup(page, 'html.parser')

li_post_id = "post-" + str(post_id)
li_soup = soup.find('li', attrs={'id':li_post_id})

这在我的ubuntu机器上工作正常,但在我的Windows服务器上运行时,我收到403 Forbidden错误,所以我认为问题出在用户代理上。

如何将此更改为Firefox?我只看过使用请求更改用户代理的教程,但我不想将所有代码更改为此。

2 个答案:

答案 0 :(得分:1)

你可以试试这个。

import random
import requests, bs4


agents= [
'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko)',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko)',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko)',
'Mozilla/5.0 (Windows NT 6.4; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)']

headers = {"User-Agent":random.choice(agents)}
response = requests.get(full_url,headers=headers)
soup = BeautifulSoup(response.text, 'lxml')

答案 1 :(得分:1)

更改标题与BeautifulSoup无关。它仅用于HTML解析。您需要在urllib请求中更改它,如下所示:

Python3

import urllib.request

req = urllib.request.build_opener()
req.addheaders = [('User-Agent', 'Some user agent')]
response = req.open('http://www.stackoverflow.com')

Python2.7

import urllib2

req = urllib2.build_opener()
req.addheaders = [('User-Agent', 'Some user agent')]
response = req.open('http://www.stackoverflow.com')