Question

哪里出错？我想解析没有标签的文本。

from bs4 import BeautifulSoup       
import re
import urllib.request
f = urllib.request.urlopen("http://www.championat.com/football/news-2442480-orlov-zenit-obespokoen---pole-na-novom-stadione-mozhet-byt-nekachestvennym.html")

soup = BeautifulSoup(f, 'html.parser')

soup=soup.find_all('div', class_="text-decor article__contain")

invalid_tags = ['b', 'i', 'u', 'br', 'a']

for tag in invalid_tags: 

  for match in soup.find_all(tag):

        match.replaceWithChildren()

soup = ''.join(map(str, soup.contents))

print (soup)

错误：

Traceback (most recent call last):
  File "1.py", line 9, in <module>
    for match in soup.find_all(tag):
AttributeError: 'ResultSet' object has no attribute 'find_all'

Answer 1

soup=soup.find_all('div', class_="text-decor article__contain")

在这一行soup成为ResultSet个实例 - 基本上是 Tag个实例的列表。并且，您获得了'ResultSet' object has no attribute 'find_all'，因为此ResultSet实例没有find_all()方法。仅供参考，这个问题实际上在文档中的troubleshooting section中进行了描述：

AttributeError: 'ResultSet' object has no attribute 'foo' - 这个通常是因为您希望find_all()返回单个标记或字符串。但find_all()会返回标记和字符串-a的列表 ResultSet对象。你需要迭代列表并查看每个人的.foo。或者，如果您真的只想要一个结果，那么您需要使用find()代替find_all()。

你真的想要一个结果，因为页面上只有一篇文章：

soup = soup.find('div', class_="text-decor article__contain")

请注意，虽然无需逐个查找代码，但您可以将代码名称列表直接传递给find_all() - BeautifulSoup在定位元素方面非常灵活：

article = soup.find('div', class_="text-decor article__contain")

invalid_tags = ['b', 'i', 'u', 'br', 'a']
for match in article.find_all(invalid_tags):
     match.unwrap()  # bs4 alternative for replaceWithChildren

AttributeError：＆＃39; ResultSet＆＃39;对象没有属性＆＃39; find_all＆＃39;

1 个答案: