尝试执行以下操作...
from lxml import etree
from lxml.etree import fromstring
if request.POST:
parser = etree.XMLParser(ns_clean=True, recover=True)
h = fromstring(request.POST['xml'], parser=parser)
return HttpResponse(h.cssselect('itagg_delivery_receipt status').text_content())
但它给出了这个错误:
[Fri Apr 05 10:27:54 2013] [error] Internal Server Error: /sms/status_postback/
[Fri Apr 05 10:27:54 2013] [error] Traceback (most recent call last):
[Fri Apr 05 10:27:54 2013] [error] File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 115, in get_response
[Fri Apr 05 10:27:54 2013] [error] response = callback(request, *callback_args, **callback_kwargs)
[Fri Apr 05 10:27:54 2013] [error] File "/usr/local/lib/python2.7/dist-packages/django/views/decorators/csrf.py", line 77, in wrapped_view
[Fri Apr 05 10:27:54 2013] [error] return view_func(*args, **kwargs)
[Fri Apr 05 10:27:54 2013] [error] File "/srv/project/livewireSMS/sms/views.py", line 42, in update_delivery_status
[Fri Apr 05 10:27:54 2013] [error] h = fromstring(request.POST['xml'], parser=parser)
[Fri Apr 05 10:27:54 2013] [error] File "lxml.etree.pyx", line 2754, in lxml.etree.fromstring (src/lxml/lxml.etree.c:54631)
[Fri Apr 05 10:27:54 2013] [error] File "parser.pxi", line 1569, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:82659)
[Fri Apr 05 10:27:54 2013] [error] ValueError: Unicode strings with encoding declaration are not supported.
这是XML
<?xml version="1.1" encoding="ISO-8859-1"?>
<itagg_delivery_receipt>
<version>1.0</version>
<msisdn>447889000000</msisdn>
<submission_ref>
845tgrgsehg394g3hdfhhh56445y7ts6</
submission_ref>
<status>Delivered</status>
<reason>4</reason>
<timestamp>20050709120945</timestamp>
<retry>0</retry>
</itagg_delivery_receipt>
我无法控制来自SMS公司的xml文档。
答案 0 :(得分:31)
您必须对其进行编码,然后在解析器中强制使用相同的编码:
from lxml import etree
from lxml.etree import fromstring
if request.POST:
xml = request.POST['xml'].encode('utf-8')
parser = etree.XMLParser(ns_clean=True, recover=True, encoding='utf-8')
h = fromstring(xml, parser=parser)
return HttpResponse(h.cssselect('delivery_reciept status').text_content())
答案 1 :(得分:9)
kernc的以下解决方案为我工作:
>>> from lxml import etree
>>> xml = u'<?xml version="1.0" encoding="utf-8" ?><foo><bar/></foo>'
>>> xml = bytes(bytearray(xml, encoding='utf-8')) # ADDENDUM OF THIS LINE (when unicode means utf-8, e.g. on Linux)
>>> etree.XML(xml)
<Element html at 0x5b44c90>
答案 2 :(得分:0)
比以上答案更简单:
from lxml import etree
#Do request for data, response = r#
data = etree.fromstring(bytes(r.text, encoding='utf-8'))