Question

我可以通过这样的unix shell脚本解析html Tag的完整参数：

# !/usr/bin/python3

# import the module
from bs4 import BeautifulSoup

# define your object
soup = BeautifulSoup(open("test.html"))

# get the tag
print(soup(itemprop="name"))

其中itemprop="name"唯一标识所需的标记。

输出类似于

[<span itemprop="name">
                    Blabla &amp; Bloblo</span>]

现在我想只返回Bla Bla Blo Blo部分。

我的尝试是：

print(soup(itemprop="name").getText())

但是我收到了AttributeError: 'ResultSet' object has no attribute 'getText'

之类的错误消息

它在其他环境中实验性地工作，例如

print(soup.find('span').getText())

那么我错了什么？

Answer 1

使用soup对象作为callable返回结果的列表，就好像您使用了soup.find_all()一样。请参阅documentation：

由于find_all()是Beautiful Soup搜索API中最受欢迎的方法，因此您可以使用快捷方式。如果将BeautifulSoup对象或Tag对象视为函数，则与在该对象上调用find_all()相同。

使用soup.find()查找第一个匹配：

soup.find(itemprop="name").get_text()

或索引到结果集：

soup(itemprop="name")[0].get_text()

用美丽的汤解析HTML。从特定标签返回文本

1 个答案: