xml字符串中的无效标记,无法创建元素树python

时间:2015-01-15 21:04:18

标签: python xml string sockets elementtree

我遇到了一个可能很难为其他人解决的问题。我试图从通过套接字而不是文件接收的xml字符串创建元素树。

方法:

下面这个python脚本是一个套接字客户端,它接收一个由c ++服务器使用tinyxml创建的python字符串(恰好是xml)。

程序步骤: 1)创建套接字 2)接收xml字符串 3)将xml解析为可在别处使用的元素树

问题:

fromstring()函数似乎无法弄明白。这是我的代码:

import socket
import sys
import struct
import binascii
import io
import re
from xml.etree import ElementTree

#illegal characters to remove from string later before going to xml
RE_XML_ILLEGAL = u'([\u0000-\u0008\u000b-\u000c\u000e-\u001f\ufffe-\uffff])' + \
             u'|' + \
             u'([%s-%s][^%s-%s])|([^%s-%s][%s-%s])|([%s-%s]$)|(^[%s-%s])' % \
              (unichr(0xd800),unichr(0xdbff),unichr(0xdc00),unichr(0xdfff),
               unichr(0xd800),unichr(0xdbff),unichr(0xdc00),unichr(0xdfff),
               unichr(0xd800),unichr(0xdbff),unichr(0xdc00),unichr(0xdfff))

HOST = 'localhost'   
PORT = 50008

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
print 'Socket created'
print 'Socket now connecting'
s.connect((HOST,PORT))
s.send('1')#as long as we are not sending "0" cpp server will return information.          

#declare global xml object "root"
global root

while 1:
    data = s.recv(1024)#receive the initial message
    data3 = data[:3]#get first 3 letters
    if (data3 == "New"):
        #get ready for new packet
        nextsizestring = data[3:]
        nextsizestring2 = nextsizestring.rstrip('\0')
        nextsize = int(nextsizestring2,10)
        s.send('b')#tell cpp we are ready for the packet

        databuf = s.recv(nextsize)#data buffer as a python string
        databuf2 = re.sub(RE_XML_ILLEGAL, "?", databuf)#remove illegal xml characters
        print(databuf2)
        root = ElementTree.ElementTree(ElementTree.fromstring(databuf2))#convert to element tree
        print(root)

    elif (data3 != "New"):
        print("WARNING! TCP SYNCH HAS FAILED")
    if not data: break#if not data then stop listening for more

    s.send('b')#keep sending anything but zero to get more stuff
conn.close()
s.close()

这是输出:

Socket created
Socket now connecting
<Frame>
     <FrameNumber ="1509677" />
     <Time ="27427839" />
     <Forceplatedata>
          <Forceplate_0>
               <Subframe#_0>
                    <F_x ="0" />
                    <F_y ="0" />
                    <F_z ="0" />
               </Subframe#_0>
.
.
.
</Frame>

Traceback (most recent call last):
  File "<string>", line 11, in <module>
  File "C:\Users\Gelsey Torres-   Oviedo\Desktop\VizardFolderVRServer\Python2CPP_Client_rev1.py", line 50, in <module>
    root = ElementTree.ElementTree(ElementTree.fromstring(databuf2))
  File "C:\Program Files (x86)\WorldViz\Vizard4\bin\lib\xml\etree\ElementTree.py", line     1282, in XML
    parser.feed(text)
  File "C:\Program Files (x86)\WorldViz\Vizard4\bin\lib\xml\etree\ElementTree.py", line 1624, in feed
    self._raiseerror(v)
  File "C:\Program Files (x86)\WorldViz\Vizard4\bin\lib\xml\etree\ElementTree.py", line 1488, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 2, column 18

我冒昧地截断上面的xml字符串,因为它很长。正如您在错误中看到的那样,看起来它遇到了第2行col 18的问题,我认为这是空间&#34; &#34;字符。我不明白为什么会这样。

失败的解决方案:

1)将字符串作为stringIO传递给parse() 2)编码和解码utf-8的几种变体 3)与minidom类似的方法

我猜这是一个语法问题?我可能做了一件非常愚蠢的事......

1 个答案:

答案 0 :(得分:0)

Senshin所说的是关键问题。我正在创建格式错误的xml。

通过更改所有看起来像的地方

<FrameNumber ="1381949" />

<FrameNumber attribute="1381949" />

程序现在可以创建元素树。

我知道这很简单,谢谢!