Question

我正在学习使用BeautifulSoup库解析Python中的HTML。我遇到错误显示

import urllib

from BeautifulSoup import *

html_doc = """
   <html><head><title>The Dormouse's story</title></head>
   <body>
   <p class="title"><b>The Dormouse's story</b></p>

   <p class="story">Once upon a time there were three little sisters; and     their names were
   <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
   <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
   <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
   and they lived at the bottom of a well.</p>
   <p class="story">...</p>
    """

   soup=BeautifulSoup(html_doc)

   print soup.prettify()

   print soup.title

   print soup.title.name

   print soup.title.string

   print soup.title.parent.name

   print soup.p

   print soup.p['class']

   print soup.a

   print soup.find_all('a')

   #for extracting URL's
   for link in soup.find_all('a'):
        print link.get('href')

  print soup.get_text()

帮我修复上面的代码。我使用过Python 2.下图显示错误

Answer 1

你没有显示有问题的错误信息，所以我猜。

您可能使用旧版BeautifulSoup而需要findAll()而不是find_all()和getText()而不是get_text()

导入新的BeautifulSoup

from bs4 import BeautifulSoup

用BeautifulSoup解析Python

1 个答案: