我正在尝试从网址获取数据,如:" http://www.sears.com/search=refrigerators"
这就是我的尝试:
>>> from cookielib import CookieJar
>>> import urllib
>>> import urllib2
>>> from bs4 import BeautifulSoup
>>> data = {}
>>> data['search'] = 'refrigerators'
>>> url_values = urllib.urlencode(data)
>>> cj = CookieJar()
>>> opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
>>> url = 'http://www.sears.com'
>>> full_url = url + '/' + url_values
>>> f = opener.open(full_url).read()
>>> soup = BeautifulSoup(f, "html.parser")
>>> print(soup.title)
<title>Shopping Tourism: Shop Internationally at Sears</title>
>>> f = opener.open(full_url).read()
>>> soup = BeautifulSoup(f, "html.parser")
>>> print(soup.title)
<title>Refrigerators from Sears.com</title>
我得到不同的标题而不是相同:(。(可能是我首先获得主页的标题)
为什么会这样? 请帮助我获取搜索页面数据。
答案 0 :(得分:0)
我建议使用请求Session对象,这是他们的CookieJar版本,但这会得到Refrigerators from Sears.com
的标题:
import requests
from bs4 import BeautifulSoup
s = requests.Session()
r = s.get("http://www.sears.com/search=refrigerators")
soup = BeautifulSoup(r.content)
print soup.title