格式正确的XML在zip文件中读取时抛出XMLSyntaxError

时间:2018-10-02 10:39:07

标签: python xml lxml zipfile

所以我在zip文件中嵌入了格式正确的XML

<?xml version="1.0" encoding="utf-8"?>
<board>
  <columns>
    <c name="Work" position="1">
      <tasks>
        <t id="9b860ebd-a18f-4944-bc0c-e6846c03a5a2" />
      </tasks>
    </c>
    <c name="Home" position="2">
      <tasks>
        <t id="6d6c6b90-5f06-49fe-90ea-50227c90bd8c" />
      </tasks>
    </c>
    <c name="Fun" position="3">
      <tasks>
        <t id="bd5f7e33-1011-4c96-8022-900dad135145" />
      </tasks>
    </c>
    <c name="Empty column" position="4">
      <tasks>
      </tasks>
    </c>
  </columns>
</board>

当该文件被解析且未嵌入存档中时,lxml不会引发解析/语法错误(这也适用于标准python ElementTree)。认为这与压缩有关,但是没有。

工作(不在存档中):

import lxml.etree as etree

# Yes, I could parse the file directly but wanted to check xml type
with open("board.xml", "r") as bo:
    xml = bytes(bo.read(), "utf8")

e = etree.fromstring(xml)

不工作(在存档中):

import zipfile
import lxml.etree as etree
# import xml.etree.ElementTree as etree

# Setting compression argument as zipfile.ZIP_DEFLATED because the archive's
# files were compressed that way changed nothing.

with zipfile.ZipFile("board.zip", "r") as boardzip:
    manifest_xml = boardzip.read("board.xml") or False

    # As seen from above, lxml.fromstring requires a bytes object. However, in 
    # the case of the embedded file, ZipFile.read already return a bytes 
    # object. Also, manifest_xml have a well-formed content.

    if manifest_xml:
        mxml = etree.fromstring(manifest_xml)

最后一个代码输出:

# With LXML
lxml.etree.XMLSyntaxError: expected '>', line 7, column 10
# With standard python
xml.etree.ElementTree.ParseError: mismatched tag: line 7, column 8

由于不匹配的标记是</tasks>,因此可能是<t>不能正确解析。但是将<t id="9b860ebd-a18f-4944-bc0c-e6846c03a5a2" />转换为<t id="9b860ebd-a18f-4944-bc0c-e6846c03a5a2"></t>也没有任何改变。

有什么想法吗? [并对最终的英语破损表示歉意

0 个答案:

没有答案