如何使用get_text()方法仅在Beautiful Soup python中提取标题

时间:2015-01-29 21:12:15

标签: python python-3.x beautifulsoup

我想只从这个页面中提取所有标题,但是当我使用get_text()方法时,会发生错误。这个例子的解决方案是什么?用例子解释。我在python3.4版本上运行此代码。

import urllib.request
from bs4 import BeautifulSoup
url = "http://www.brecorder.com/"

urls = [url]
visited = [url]
while len(urls)>0:
     try:
          htmltext = urllib.request.urlopen(urls[0]).read()
          response = htmltext
     except:
          print(urls[0])

     soup = BeautifulSoup(response)
     urls.pop(0)
     soup = soup.find_all("h2")
     print(soup.get_text())

错误是: AttributeError:'ResultSet'对象没有属性'get_text'

或者,如果我替换此行

soup = soup.find_all("h2")

来自这个

soup = soup.select("h2")

发生以下错误:

AttributeError: `list` object has no attribute `get_text`

1 个答案:

答案 0 :(得分:1)

您正尝试将个别元素上定义的方法应用于整个集合或列表中。

soup.select()soup.find_all()都会返回元素的列表,而不只是一个。你使用循环:

for element in soup.select('h2'):
    print(element.get_text())

或者您可以将该方法应用于列表推导中的每个元素,以生成新列表:

print([element.get_text() for element in soup.select('h2')])

演示:

>>> import urllib.request
>>> from bs4 import BeautifulSoup
>>> url = "http://www.brecorder.com/"
>>> soup = BeautifulSoup(urllib.request.urlopen(url))
>>> print([element.get_text() for element in soup.select('h2')])
["Editor's choice", 'Op/Ed ', 'Business & Finance ', 'Markets ', 'Taxation ', 'BR Research ', 'Cotton & Textile ', 'Entertainment ', 'Currency Converter ', 'KSE Market Live ', 'Sports\t\t ']