应用错误收集

如何使用Python中的BeautifulSoup解析格式错误的html（无正文标记）？

时间：2014-06-24 01:44:56

标签： python lxml

如何使用Python中的BeautifulSoup解析格式错误的html（无正文标记）？

我有一堆要解析的html文件，但它们不包含body标签，这会导致BeautifulSoup出现问题。见下文：

f = "somefile.html"
html = open(f,'r').read()
soup = BeautifulSoup(html)
print soup.prettify()

Output>>>

<html>
 <head>
  <title>
   Test Results
  </title>
 </head>
</html>

0 个答案:

没有答案