解析目录中的html文件,并检查它们是否在Python中形成错误

时间:2012-06-12 17:46:12

标签: python html beautifulsoup non-well-formed

我希望编写一个脚本来浏览目录并检查html文件是否格式错误。请参阅我的代码

directory = "html"
for root, dirs, files in os.walk(directory):
    for file in files:
        if str(file).endswith('.html'):
              #Help needed here
              if file is badly formed:
                 print "Badly Formed"
              else:
                 print "Well Formed"

1 个答案:

答案 0 :(得分:1)

import xml.etree.ElementTree as ETree
....

    try:
        self.doc = ETree.parse( file )
        # do stuff with it ...
    except  ETree.ParseError :
        print( "ERROR in {0} : {1}".format( ETree.ParseError.filename, ETree.ParseError.msg ) )