Question

我有html文件：

<html>
<head>
</head>
<body>

</body>

</html>

使用beautifulsoup我如何遍历这个html树，我想知道头部是否在html标签内。

我试过这个找到html标签，但是现在如何测试head是否在html里面。

invalid = """<html>
<html>

</html>
</html>"""

soup = BeautifulSoup(invalid, 'html.parser')
if soup.find("html") == 1:
    print ('found')
else:
    print 'no html tag'

Answer 1

美丽的汤有a built-in method。

from bs4 import BeautifulSoup


invalid = """<html>
<body>
</body>
</html>"""

valid = """<html>
<head>
</head>
<body>
</body>
</html>"""


def hashead(soup):
    if soup.head:
        print 'found head'
    else:
        print 'no head'

badsoup = BeautifulSoup(invalid, 'html.parser')
goodsoup = BeautifulSoup(valid, 'html.parser')

hashead(badsoup)
# >>> no head
hashead(goodsoup)
# >>> found head

如何检查html是否具有BeautifulSoup的head元素。

1 个答案: