Question

我正在尝试对draw.io吐出的XML进行解码。根据他们的documentation，这是“使用标准放气压缩的”。

我正在使用此question中提供的代码来防止通货膨胀。

import zlib
import base64

def decode_base64_and_inflate( b64string ):
    decoded_data = base64.b64decode( b64string )
    return zlib.decompress( decoded_data , -15)

样本输入文件：

<mxlibrary>[{"xml":"rVLJboMwEP0aH1t5EUg5Bmhy6ilfQMsULBlMbRNIv77jhSAOSD1UwnjmvdmsN0SU/XI19di96wYUEW9ElEZrF61+KUEpwqlsiKgI5xQP4ZcDlgWWjrWBwf0lgceEe60miAjhucLUopF3JKx7qEjk35MfqvjSg3ux8gfRMwacxgXBjUar9ffZWug/FJi1Hs4QSkb6n7qwDMkFD6NHffiuPjd6Ghrwr6dIz510cBvrT8/OqAJinetRhoqlKW5hiOqE7qjl4KyvkxX4YcuSvgqBFyMZFi+fYJ7vQBbAbB8Y/PT3aFYlLcA4WA71DFAS8wq6B2ceGDLLxnUpIoua0w5k261pNIG1jUD7zN3WA420Iau7bWLgdov6Cw==","w":150,"h":100,"aspect":"fixed"}]</mxlibrary>

我正在这样阅读：

from xml.dom import minidom
from urllib.parse import unquote
xmldoc = minidom.parse('samplescratchpad.xml')
buildings = xmldoc.getElementsByTagName('mxlibrary')
# I know eval is bad, but this was being returned as '[...]' instead of 
# just a list.
all_buildings = eval(buildings[0].firstChild.nodeValue)
for building in all_buildings:
    print(type(decode_base64_and_inflate(building['xml'])))
    print(decode_base64_and_inflate(building['xml']))
    print(unquote(decode_base64_and_inflate(building['xml'])))

前两个打印语句的输出为：

<class 'bytes'>
b'%3CmxGraphModel%3E%3Croot%3E%3CmxCell%20id%3D%220%22%2F%3E%3CmxCell%20id%3D%221%22%20parent%3D%220%22%2F%3E%3CmxCell%20id%3D%222%22%20value%3D%22%26lt%3Bdiv%20style%3D%26quot%3Bfont-size%3A%209px%3B%26quot%3B%26gt%3BAssembler%26lt%3B%2Fdiv%26gt%3B%26lt%3Bdiv%20style%3D%26quot%3Bfont-size%3A%209px%3B%26quot%3B%26gt%3B15%20x%2010%26lt%3B%2Fdiv%26gt%3B%22%20style%3D%22rounded%3D0%3BwhiteSpace%3Dwrap%3Bhtml%3D1%3BfontSize%3D9%3Bpoints%3D%5B%5B0%2C0.33%2C1%5D%2C%5B0%2C0.66%2C1%5D%2C%5B1%2C0.5%2C1%5D%2C%5B0.5%2C0.5%2C0%5D%5D%22%20vertex%3D%221%22%20parent%3D%221%22%3E%3CmxGeometry%20width%3D%22150%22%20height%3D%22100%22%20as%3D%22geometry%22%2F%3E%3C%2FmxCell%3E%3C%2Froot%3E%3C%2FmxGraphModel%3E'

我试图将以上内容转换为更标准的XML的最后打印失败：

  File "test_deflate.py", line 36, in <module>
    print(unquote(decode_base64_and_inflate(building['xml'])))
  File "/usr/lib/python3.5/urllib/parse.py", line 537, in unquote
    if '%' not in string:
TypeError: a bytes-like object is required, not 'str'

当我尝试unquote时，如何解决此问题，以便拥有的字节对象（请参见前两个打印输出）起作用？

奖金：eval行上确实需要all_buildings =吗？

Answer 1

实际上，您遇到了问题，因为错误消息具有误导性。

您的参数是一个类似bytes的对象，但是unquote函数使用表达式'%' in string，而'%'不是类似于bytes的对象，该对象不起作用。两个操作数都必须为bytes，或者两个都必须为str。

Python会误导您将第一个操作数（'%'）更改为bytes，但是由于这是函数的硬编码部分，因此是不可能的。您需要将另一个参数改为str。

尝试更换

print(
    unquote(
        decode_base64_and_inflate(building['xml'])
    )
)

与

print(
    unquote(
        decode_base64_and_inflate(
            building['xml']
        ).decode('utf8')
    )
)

这会将bytes解码为UTF8编码的Unicode字符串（最有可能是正确的编码），并产生一个str，可以将其传递给unquote()。

编辑：Python使用此错误消息的原因是in运算符在内部是第二个操作数上的方法调用；也就是说，a in b被评估为b.__contains__(a)。因此，b决定允许的类型为a，而不是相反。这意味着Python会告诉您更改第一个操作数的类型，而不是告诉您更改第二个操作数。 / p>

TypeError：需要一个类似字节的对象，而不是'str'，但是我有一个bytes对象

1 个答案: