如何使用Python中的BeautifulSoup解析格式错误的html(无正文标记)?
我有一堆要解析的html文件,但它们不包含body标签,这会导致BeautifulSoup出现问题。见下文:
f = "somefile.html"
html = open(f,'r').read()
soup = BeautifulSoup(html)
print soup.prettify()
Output>>>
<html>
<head>
<title>
Test Results
</title>
</head>
</html>