Question

我正在使用BeautifulSoup解析一些XML，并且数据看起来像：

soup.FindAll('title')

<title>main title</title>
<title>other title</title>
<title>another title</title>

当迭代标签时，我想跳过第一个标题。所以我有：

for e in soup.findAll('title'):
    if e == '<title>main title</title>':
        pass
    else:
        print (e)

这仍会返回所有标题，包括main title。我也试过删除标题标签。

感谢您的帮助。

Answer 1

如果您想跳过第一个标题，那么更好的解决方案是对列表进行切片：

>>> soup.findAll('title')[1:]
[<title>other title</title>, <title>another title</title>]

Answer 2

您想要将对象<class 'bs4.element.Tag'>与字符串进行比较，布尔值无法正常工作，它始终为False。您可以将其转换为字符串然后比较它们。

试试这个：

for e in soup.find_all("title"):
    if str(e) == '<title>main title</title>':
        pass
    else:
        print (e)

输出：

<title>other title</title>
<title>another title</title>

Answer 3

您可以检查节点的text属性，而不是节点本身。

from bs4 import BeautifulSoup
soup = BeautifulSoup("""<title>main title</title>
<title>other title</title>
<title>another title</title>""", "html.parser")

for e in soup.find_all("title"):
    if e.text != 'main title':
        print(e)

#<title>other title</title>
#<title>another title</title>

对于字符串等效，布尔值不会注册为true

3 个答案: