我正在学习使用BeautifulSoup库解析Python中的HTML。我遇到错误显示
import urllib
from BeautifulSoup import *
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""
soup=BeautifulSoup(html_doc)
print soup.prettify()
print soup.title
print soup.title.name
print soup.title.string
print soup.title.parent.name
print soup.p
print soup.p['class']
print soup.a
print soup.find_all('a')
#for extracting URL's
for link in soup.find_all('a'):
print link.get('href')
print soup.get_text()
答案 0 :(得分:0)
你没有显示有问题的错误信息,所以我猜。
您可能使用旧版BeautifulSoup
而需要findAll()
而不是find_all()
和getText()
而不是get_text()
导入新的
BeautifulSoup
from bs4 import BeautifulSoup