响应标头Content-Type:application / xop + xml和lxml.etree.fromstring解析

时间:2019-05-16 02:10:03

标签: python-3.x python-requests lxml

我从SOAP API获得响应,该SOAP API的内容类型为:application / xop + xml。我不确定我可以如何使用Response.text来使lxml.etree.fromstring高效地使用xml。

这是Response.text

 --uuid:051145c9-9210-4e26-a390-d7cdd06b9f94
Content-Type: application/xop+xml; charset=UTF-8; type="text/xml"
Content-Transfer-Encoding: binary
Content-ID: <root.message@cxf.apache.org>

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"><soap:Body><listResponse xmlns="http://www.strongmail.com/services/v2/schema"><objectId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserId"><id>101</id></objectId><objectId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserId"><id>102</id></objectId><objectId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserId"><id>103</id></objectId><objectId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserId"><id>107</id></objectId><objectId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserId"><id>108</id></objectId><objectId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserId"><id>109</id></objectId></listResponse></soap:Body></soap:Envelope>
--uuid:051145c9-9210-4e26-a390-d7cdd06b9f94--

获取.text并让etree.fromstring对其进行解析

from lxml import etree
resXML = etree.fromstring(theResponse.text)

给出以下内容:

    resXML = etree.fromstring(theResponse.text)
  File "src/lxml/etree.pyx", line 3222, in lxml.etree.fromstring
  File "src/lxml/parser.pxi", line 1877, in lxml.etree._parseMemoryDocument
  File "src/lxml/parser.pxi", line 1758, in lxml.etree._parseDoc
  File "src/lxml/parser.pxi", line 1068, in lxml.etree._BaseParser._parseUnicodeDoc
  File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 640, in lxml.etree._raiseParseError
  File "<string>", line 1
lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1

我相信这是因为所有XML都以“ <”开头,所以这是第一件事。

我在lxml.etree文档https://lxml.de/tutorial.html#parsing-from-strings-and-files中环顾四周,发现.parse但这仅在文件中。查看Response的方法,虽然文档使用json

进行了介绍,但我可以看到有关标题的信息,例如内容类型。

Response中是否有一些方法可以仅提取不包括标题的xml部分,还是lxml.etree中有一个?

1 个答案:

答案 0 :(得分:0)

您可以这样处理:

theResponse = [your response above]

from lxml import etree
from io import StringIO

parser = etree.HTMLParser()
tree   = etree.parse(StringIO(theResponse), parser)

从这一点开始,lxml可以处理它。举一个随机的例子,如果您在响应中的链接之后,可以尝试:

for i in tree.iter():
if len(i.values())>0:
       print(i.values()[0])

输出将是:

http://schemas.xmlsoap.org/soap/envelope/
http://www.strongmail.com/services/v2/schema
http://www.w3.org/2001/XMLSchema-instance