xmlformatter忽略少于引号

时间:2019-03-20 09:02:30

标签: python xml python-2.7 xml-formatting

我正在尝试格式化以下xml

<block formula="MY_VAR < 3"><set-variable name="OTHER_VAR"></set-variable></block>

进入

<block formula="MY_VAR < 3">
  <set-variable name="OTHER_VAR">
  </set-variable>
</block>

使用xmlformatter并由于公式中的<而出错。具体来说,错误是

  

ExpatError:格式不正确(令牌无效)

当我尝试代码时

my_xml = '<block formula="MY_VAR < 3"><set-variable name="OTHER_VAR"></set-variable></block>'
formatter = xmlformatter.Formatter(indent="1", indent_char="  ", encoding_output="UTF-8", preserve=["literal"])
pretty_xml = formatter.format_string(my_xml)

如何在公式中包含小于号,并能够格式化XML?

1 个答案:

答案 0 :(得分:0)

构造xml字符串时,可以使用xml.sax.saxutils.quoteattr来转义属性值。

>>> my_xml = '<block formula=%s><set-variable name="OTHER_VAR"></set-variable></block>' % su.quoteattr('MY_VAR < 3')
>>> my_xml
'<block formula="MY_VAR &lt; 3"><set-variable name="OTHER_VAR"></set-variable></block>'

如果您不控制xml的构造,此hack将在示例中修复xml:

stack = []

out = []
brackets = '<>'

for c in bad_xml:
    if c in brackets:
        try:
            prev = stack[-1]
        except IndexError:
            stack.append(c)
            out.append(c)
        else:
            if prev == c:
                escaped = '&gt;' if c == '>' else '&lt;'
                out.append(escaped)
            else:
                stack.append(c)
                out.append(c)
    else:
        out.append(c)
my_xml = ''.join(out)