我想只从这个页面中提取所有标题,但是当我使用get_text()
方法时,会发生错误。这个例子的解决方案是什么?用例子解释。我在python3.4版本上运行此代码。
import urllib.request
from bs4 import BeautifulSoup
url = "http://www.brecorder.com/"
urls = [url]
visited = [url]
while len(urls)>0:
try:
htmltext = urllib.request.urlopen(urls[0]).read()
response = htmltext
except:
print(urls[0])
soup = BeautifulSoup(response)
urls.pop(0)
soup = soup.find_all("h2")
print(soup.get_text())
错误是: AttributeError:'ResultSet'对象没有属性'get_text'
或者,如果我替换此行
soup = soup.find_all("h2")
来自这个
soup = soup.select("h2")
发生以下错误:
AttributeError: `list` object has no attribute `get_text`
答案 0 :(得分:1)
您正尝试将个别元素上定义的方法应用于整个集合或列表中。
soup.select()
和soup.find_all()
都会返回元素的列表,而不只是一个。你使用循环:
for element in soup.select('h2'):
print(element.get_text())
或者您可以将该方法应用于列表推导中的每个元素,以生成新列表:
print([element.get_text() for element in soup.select('h2')])
演示:
>>> import urllib.request
>>> from bs4 import BeautifulSoup
>>> url = "http://www.brecorder.com/"
>>> soup = BeautifulSoup(urllib.request.urlopen(url))
>>> print([element.get_text() for element in soup.select('h2')])
["Editor's choice", 'Op/Ed ', 'Business & Finance ', 'Markets ', 'Taxation ', 'BR Research ', 'Cotton & Textile ', 'Entertainment ', 'Currency Converter ', 'KSE Market Live ', 'Sports\t\t ']