python BeautifulSoup soup.findAll(),如何使搜索结果匹配

时间:2015-11-13 21:13:19

标签: python

我最近学习了BeautifulSoup,作为练习,我想使用BeautifulSoup从工作发布中读取和提取公司和位置信息。我的代码是:

import urllib
from BeautifulSoup import *

url="http://www.indeed.com/jobs?q=hadoop&start=50"
html=urllib.urlopen(url).read()
soup=BeautifulSoup(html)
company=soup.findAll("span",{"class":"company"})
location=soup.findAll("span",{"class":"location"})

# for c in company:
#   print c.text
# print 
# for l in location:
#   print l.text

print len(company)
print len(location)

我发现公司和地点的长度不一样。所以我不知道哪一对(公司,地点)不完整。我怎样才能让它们匹配?

1 个答案:

答案 0 :(得分:3)

您需要遍历搜索结果块并获取每个块的公司位置对

for result in soup.find_all("div", {"class": "result"}):  # or soup.select("div.result")
    company = result.find("span", {"class": "company"}).get_text(strip=True)
    location = result.find("span", {"class": "location"}).get_text(strip=True)

    print(company, location)

您还应该切换到BeautifulSoup4,您使用的版本已经过时了:

pip install beautifulsoup4

并替换:

from BeautifulSoup import *

使用:

from bs4 import BeautifulSoup

上面的代码打印:

(u'PsiNapse', u'San Mateo, CA')
(u'Videology', u'Baltimore, MD')
(u'Charles Schwab', u'Lone Tree, CO')
(u'Cognizant', u'Dover, NH')
...
(u'Concur', u'Bellevue, WA')