Question

你知道为什么BeautifulSoup教程http://www.crummy.com/software/BeautifulSoup/documentation.html#QuickStart中的第一个例子给出了AttributeError: 'NavigableString' object has no attribute 'name'吗？根据{{3}}，HTML中的空格字符会导致问题。我尝试了几页的来源和1个工作，其他人给出了同样的错误（我删除了空格）。你能解释“名称”所指的是什么以及为什么会发生这种错误吗？谢谢。

Answer 1

如果对象是name对象，则

Tag将引用标记的名称（即：<html> name =“html”）

如果节点之间的标记中有空格，BeautifulSoup会将这些空格转换为NavigableString。因此，如果您使用contents的索引来抓取节点，则可能会抓取NavigableString而不是下一个Tag。

要避免这种情况，请查询您要查找的节点：Searching the Parse Tree

或者如果你知道你想要的下一个标记的名称，你可以使用该名称作为属性，它将返回具有该名称的第一个Tag或None如果没有该名称的孩子名称存在：Using Tag Names as Members

如果您想使用contents，则必须检查您正在使用的对象。您获得的错误只是意味着您正在尝试访问name属性，因为代码假定它是Tag

Answer 2

在迭代树时忽略NavigableString个对象：

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

for body_child in soup.body.children:
    if isinstance(body_child, NavigableString):
        continue
    if isinstance(body_child, Tag):
        print(body_child.name)

Answer 3

您可以使用try catch来消除在循环中解析Navigable String时的情况，如下所示：

    for j in soup.find_all(...)
        try:
            print j.find(...)
        except NavigableString: 
            pass

BeautifulSoup：AttributeError：'NavigableString'对象没有属性'name'

3 个答案: