我正在尝试解析http://www.ted.com/talks页面的所有谈话名称。使用BeautifulSoup,我就是这样:
import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen("http://www.ted.com/talks")
soup = BeautifulSoup(page)
link = soup.findAll(lambda tag: tag.name == 'a' and tag.findParent('dt', 'thumbnail'))
for anchor in link.findAll('a', title = True):
print anchor['title']
初始“链接”显示八个视频块的一个很好的数组。然后我尝试通过这个并使用上面的代码取出标签中的标题,这给了我以下错误:
for anchor in link.findAll('a', title=True):
AttributeError: 'ResultSet' object has no attribute 'findAll'
我做错了什么?
答案 0 :(得分:3)
link
是Tag
个对象的集合,您需要迭代它们。例如:
for anchor in link:
print anchor['title']
答案 1 :(得分:0)
通过比较,一个pyparsing方法看起来像这样:
from contextlib import closing
import urllib2
from pyparsing import makeHTMLTags, withAttribute
# pull HTML from web page
with closing(urllib2.urlopen("http://www.ted.com/talks")) as page:
html = page.read()
# define opening and closing tags
dt,dtEnd = makeHTMLTags("dt")
a,aEnd = makeHTMLTags("a")
# restrict <dt> tag matches to those with class='thumbnail'
dt.setParseAction(withAttribute(**{'class':'thumbnail'}))
# define pattern of <dt> tag followed immediately by <a> tag
patt = dt + a("A")
# scan input html for matches of this pattern, and access
# attributes of the <A> tag
for match,s,e in patt.scanString(html):
print match.A.title
print match.A.href
print
,并提供:
Bruce Schneier: The security mirage
/talks/bruce_schneier.html
Harvey Fineberg: Are we ready for neo-evolution?
/talks/harvey_fineberg_are_we_ready_for_neo_evolution.html
Ric Elias: 3 things I learned while my plane crashed
/talks/ric_elias.html
Anil Ananthaswamy: What it takes to do extreme astrophysics
/talks/anil_ananthaswamy.html
John Hunter on the World Peace Game
/talks/john_hunter_on_the_world_peace_game.html
Kathryn Schulz: On being wrong
/talks/kathryn_schulz_on_being_wrong.html
Sam Richards: A radical experiment in empathy
/talks/sam_richards_a_radical_experiment_in_empathy.html
Susan Lim: Transplant cells, not organs
/talks/susan_lim.html
Marcin Jakubowski: Open-sourced blueprints for civilization
/talks/marcin_jakubowski.html
Roger Ebert: Remaking my voice
/talks/roger_ebert_remaking_my_voice.html