使用BeautifulSoup查找特定文本

时间:2014-04-01 14:16:31

标签: python web-scraping beautifulsoup

我正试图从此页面中获取所有获胜者类别: http://www.chicagoreader.com/chicago/BestOf?category=4053660&year=2013

我是以崇高的方式写的:

import urllib2
from bs4 import BeautifulSoup
url = "http://www.chicagoreader.com/chicago/BestOf?category=4053660&year=2013"
page = urllib2.urlopen(url)
soup_package = BeautifulSoup(page)
page.close()

#find everything in the div class="bestOfItem). This works.
all_categories = soup_package.findAll("div",class_="bestOfItem")
# print(all_categories)

#this part breaks it:
soup = BeautifulSoup(all_categories)
winner = soup.a.string
print(winner)

当我在终端中运行它时,我收到以下错误:

Traceback (most recent call last):
  File "winners.py", line 12, in <module>
    soup = BeautifulSoup(all_categories)
  File "build/bdist.macosx-10.9-intel/egg/bs4/__init__.py", line 193, in __init__
  File "build/bdist.macosx-10.9-intel/egg/bs4/builder/_lxml.py", line 99, in prepare_markup
  File "build/bdist.macosx-10.9-intel/egg/bs4/dammit.py", line 249, in encodings
  File "build/bdist.macosx-10.9-intel/egg/bs4/dammit.py", line 304, in find_declared_encoding
TypeError: expected string or buffer

任何人都知道那里发生了什么?

1 个答案:

答案 0 :(得分:2)

您正尝试从元素的列表创建新的BeautifulSoup对象。

soup = BeautifulSoup(all_categories)

这里绝对没有必要这样做;只是循环遍历每个匹配:

for match in all_categories:
    winner = match.a.string
    print(winner)