Question

我是BeautifulSoup4的新手并且非常集中地学习它。问题在于下一段代码（我在页面 https://www.crummy.com/software/BeautifulSoup/bs4/doc/ 的文档中找到了它，关于函数定义的文章）：

  def has_class_but_no_id(tag):
    return tag.has_attr('class') and not tag.has_attr('id')     (A)
  soup.find_all(has_class_but_no_id)

我希望得到这样的结果（见文档）：

  # [<p class="title"><b>The Dormouse's story</b></p>,
  #  <p class="story">Once upon a time there were...</p>,       (B)
  #  <p class="story">...</p>]

但我得到了下一个结果：

  [<p class="title"><b>The Dormouse's story</b></p>, <p class="story">Once 
  upon a time there were three little sisters; and their names were
  <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,                     
  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>; 
  and they lived at the bottom of a well.</p>, <p class="story">...</p>]

我检查了文档，发现只推荐使用 .has_attr 方法。而且没有更多细节。如何更改初始代码（A）以获得预期结果（B）？任何人都可以帮忙解决这个问题吗？日Thnx。

Answer 1

有效。您必须注意，列表中的第二个结果未在内部标记（子标记）中检查相同的条件。因此，包裹<p class="story">符合条件，并已放入结果列表及其所有内容。

此结果列表：

[<p class="title"><b>The Dormouse's story</b></p>,
 -------------------------
 <p class="story">Once 
      upon a time there were three little sisters; and their names were
      <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
      <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
      <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>; 
      and they lived at the bottom of a well.</p>,
 -------------------------
 <p class="story">...</p>]

包含三个标记，每个项目都有＆＃39; class＆＃39; attr，没有＆＃39; id＆＃39; ATTR。

Answer 2

Doc说：

此功能只会选择'p'标记。它不会选择'a'标记，因为这些标记同时定义了“class”和“ ID”。它不会选择像'html'和'title'这样的标签，因为这些标签没有定义“class”。

{{1}}

目前还不清楚它会导致人们在没有任何标签的情况下期望结果。他们应该更改声明或示例。

BeautifulSoup4文档示例不起作用

2 个答案: