应用错误收集

我有一个包含多个doc标签的文件。每个doc标记都包含一个docID标记。如果docID标记匹配，我需要在doc标记内部获取所有内容。我正在使用HTMLparser来解析文件。
所以我需要做的是：
1.递归迭代所有doc标签。
2.对于每个doc标记，如果其中的docID标记匹配，则获取doc标记下的所有内容。
3.对所有文档标记重复第二步。

def get_docs(self, filepaths):

    parser = etree.HTMLParser()
    for file in filepaths:
        tree = etree.parse(file, parser)
        # tree = etree.parse(file)
        doc = tree.findall('.//doc')
        for elem in doc:
            print etree.tostring(elem)

我目前正在尝试获取每个doc标记内的内容，但text_content（）失败。这样做时我会遇到错误
AttributeError：'lxml.etree._Element'对象没有属性'text_content'

如何使用lxml查找所有出现的标记

0 个答案: