Question

我正在尝试使用python 3.4中的xml.etree.ElementTree解析large XML file（一本圣经书）（为了与Windows兼容，我希望保留标准库模块），以及相关方法在这里。

class BibleTree:
    def __init__(self, file_name: str) -> None:
        self.root = ET.parse(file_name).getroot()

    @staticmethod
    def _list_to_clean_text(str_in: str) -> str:
        out = re.sub(r'[\s\n]+', ' ', str_in, flags=re.DOTALL)
        return out.strip()

    @staticmethod
    def _clean_text(intext: Optional[str]) -> str:
        return intext if intext is not None else ''

    def __iter__(self) -> Tuple[int, int, str]:
        collected = None
        cur_chap = 0
        cur_verse = 0

        for child in self.root:
            if child.tag in ['kap', 'vers']:
                if collected and collected.strip():
                    yield cur_chap, cur_verse, self._list_to_clean_text(collected)
                if child.tag == 'kap':
                    cur_chap = int(child.attrib['n'])
                elif child.tag == 'vers':
                    cur_verse = int(child.attrib['n'])
                collected = self._clean_text(child.tail)
            else:
                if collected is not None:
                    collected += self._clean_text(child.text)
                    collected += self._clean_text(child.tail)

问题是在某些情况下（例如，第54行的元素<odkazo/>）变量tail的{{1}}属性为None，尽管它应该是IMHO文本。

任何想法，我做错了什么，拜托？

Answer 1

这是PEBKAC ......我假设其他元素中没有里程碑式的元素。所以，我需要将整个函数重写为递归函数。哦，好吧。

一些element.tail属性是空的，尽管它们不应该

1 个答案: