使用python中的urllib从url获取数据

时间:2015-07-27 13:14:18

标签: python request beautifulsoup urllib2

我正在尝试从网址获取数据,如:" http://www.sears.com/search=refrigerators"

这就是我的尝试:

>>> from cookielib import CookieJar
>>> import urllib
>>> import urllib2
>>> from bs4 import BeautifulSoup
>>> data = {}
>>> data['search'] = 'refrigerators'
>>> url_values = urllib.urlencode(data)
>>> cj = CookieJar()
>>> opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
>>> url = 'http://www.sears.com'
>>> full_url = url + '/' + url_values
>>> f = opener.open(full_url).read()
>>> soup = BeautifulSoup(f, "html.parser")
>>> print(soup.title)
<title>Shopping Tourism: Shop Internationally at Sears</title>
>>> f = opener.open(full_url).read()
>>> soup = BeautifulSoup(f, "html.parser")
>>> print(soup.title)
<title>Refrigerators from Sears.com</title>

我得到不同的标题而不是相同:(。(可能是我首先获得主页的标题)

为什么会这样? 请帮助我获取搜索页面数据。

1 个答案:

答案 0 :(得分:0)

我建议使用请求Session对象,这是他们的CookieJar版本,但这会得到Refrigerators from Sears.com的标题:

import requests
from bs4 import BeautifulSoup

s = requests.Session()

r = s.get("http://www.sears.com/search=refrigerators")

soup = BeautifulSoup(r.content)

print soup.title