Question

我有以下html代码：

<div>
    <span class="test">
     <span class="f1">
      5 times
     </span>
    </span>

    </span>
   </div>

<div>

</div>

<div>
    <span class="test">
     <span class="f1">
      6 times
     </span>
    </span>

    </span>
   </div>

我设法在树上导航，但是在尝试打印时出现以下错误：

AttributeError: 'list' object has no attribute 'text'

Python代码有效：

x=soup.select('.f1')
print(x)

给出以下内容：

[]
[]
[]
[]
[<span class="f1"> 19 times</span>]
[<span class="f1"> 12 times</span>]
[<span class="f1"> 6 times</span>]
[]
[]
[]
[<span class="f1"> 6 times</span>]
[<span class="f1"> 1 time</span>]
[<span class="f1"> 11 times</span>]

但是print(x.prettify)抛出以上错误。我基本上是在尝试获取所有实例的span标签之间的文本，如果没有，则为空白，可用时为字符串。

Answer 1

from bs4 import BeautifulSoup
html = '''<div>
    <span class="test">
     <span class="f1">
      5 times
     </span>
    </span>
    </span>
   </div>
<div>
</div>
<div>
    <span class="test">
     <span class="f1">
      6 times
     </span>
    </span>
    </span>
   </div>'''


soup = BeautifulSoup(html, 'html.parser')
aaa = soup.find_all('span', attrs={'class':'f1'})
for i in aaa:
    print(i.text)

输出：

5 times
6 times

Answer 2

我建议您使用.findAll方法并在匹配的跨度上循环。

示例：

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'lxml')

for span in soup.findAll("span", class_="f1"):
    if span.text.isspace():
        continue
    else:
        print(span.text)

.isspace()方法正在检查字符串是否为空（检查字符串是否为True在这里不起作用，因为html跨度是空的冒号。）

Answer 3

select()返回结果列表，无论结果是否包含0个项目。由于list对象没有text属性，因此它为您提供了AttributeError。

同样，prettify()是为了使html更具可读性，而不是解释list的方式。

如果您只想提取text（如果可用）：

texts = [''.join(i.stripped_strings) for i in x if i]

# ['5 times', '6 times']

这将删除字符串中所有多余的空格/换行符，并仅提供裸文本。最后的if i表示仅在text不是i时返回None。

如果您实际上关心空格/换行符，请执行以下操作：

texts  = [i.text for i in x if i]

# ['\n      5 times\n     ', '\n      6 times\n     ']

Beautifulsoup AttributeError：“列表”对象没有属性“文本”

3 个答案: