我正在尝试添加用户代理到我在Python 3中使用urllib和BeautifulSoup。这是我的代码
import bs4 as bs
import urllib.request
import urllib.parse
from random import choice
from time import sleep
import os
user_agents = [
'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11',
'Opera/9.25 (Windows NT 5.1; U; en)',
'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)',
'Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (like Gecko) (Kubuntu)',
'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.142 Safari/535.19',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:11.0) Gecko/20100101 Firefox/11.0',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0.1) Gecko/20100101 Firefox/8.0.1',
'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.151 Safari/535.19'
]
allUrlData = ['www.bbc.co.uk/news','http://www.bbc.co.uk/news/world']
r = range(2,4)
for url in allUrlData:
sleep(choice(r))
version = choice(user_agents)
headers = {'User-Agent': version}
req = urllib.request.Request(url, None, headers)
htmlText = urllib.request.urlopen(req).read()
soup = bs.BeautifulSoup(htmlText, 'lxml')
如果我将req
对象传递给urlopen()
方法,它仍会包含用户代理,我会感到很困惑。
此代码是否正常运行并通过用户代理?
我是否需要使用Request.add_header(key, val)
才能使其正常工作?
非常感谢您的帮助。