我有html文件:
<html>
<head>
</head>
<body>
</body>
</html>
使用beautifulsoup我如何遍历这个html树,我想知道头部是否在html标签内。
我试过这个找到html标签,但是现在如何测试head是否在html里面。
invalid = """<html>
<html>
</html>
</html>"""
soup = BeautifulSoup(invalid, 'html.parser')
if soup.find("html") == 1:
print ('found')
else:
print 'no html tag'
答案 0 :(得分:0)
美丽的汤有a built-in method。
from bs4 import BeautifulSoup
invalid = """<html>
<body>
</body>
</html>"""
valid = """<html>
<head>
</head>
<body>
</body>
</html>"""
def hashead(soup):
if soup.head:
print 'found head'
else:
print 'no head'
badsoup = BeautifulSoup(invalid, 'html.parser')
goodsoup = BeautifulSoup(valid, 'html.parser')
hashead(badsoup)
# >>> no head
hashead(goodsoup)
# >>> found head