与ElementTree的麻烦

时间:2019-01-03 12:54:03

标签: python xml elementtree

我有以下xml文件(其中包含超过2 GB的数据):

<events version="1.0">
    <event time="10998.0" type="actend" person="1" link="link36" actType="home"  />
    <event time="10998.0" type="departure" person="1" link="link36" legMode="car"  />
    <event time="10998.0" type="PersonEntersVehicle" person="1" vehicle="1"  />
....
</events>

为了读取和分析数据,我尝试使用这种方法:http://boscoh.com/programming/reading-xml-serially.html

但是当我尝试命名空间时:

nsmap = {}
for event, elem in etree.iterparse(xmL, events=('start-ns')):
  ns, url = elem
  nsmap[ns] = url
print(nsmap)

发生错误:

Traceback (most recent call last):

  File "<ipython-input-16-6baf583a11d5>", line 1, in <module>
    runfile('C:/Codezeug/Pypy/01/PlayingAround.py', wdir='C:/Codezeug/Pypy/01')

  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 668, in runfile
    execfile(filename, namespace)

  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 108, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Codezeug/Pypy/01/PlayingAround.py", line 22, in <module>
    for event, elem in etree.iterparse(one, events=('start-ns')):

  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\xml\etree\ElementTree.py", line 1218, in iterparse
    pullparser = XMLPullParser(events=events, _parser=parser)

  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\xml\etree\ElementTree.py", line 1261, in __init__
    self._parser._setevents(self._events_queue, events)

ValueError: unknown event 's'

此代码如何工作以及为什么搜索“ s”?

1 个答案:

答案 0 :(得分:0)

您需要提供一个元组

for event, elem in etree.iterparse(xmL, events=('start-ns',)): # added , to make it a tuple

否则,它将字符串解释为可迭代的,并分别尝试每个字符。


您的XML不包含名称空间:

t = """<events version="1.0">
    <event time="10998.0" type="actend" person="1" link="link36" actType="home"  />
    <event time="10998.0" type="departure" person="1" link="link36" legMode="car"  />
    <event time="10998.0" type="PersonEntersVehicle" person="1" vehicle="1"  />
</events>"""

with open("data.xml","w") as f: f.write(t)

import xml.etree.ElementTree as etree
with open("data.xml") as f:
    for event, elem in etree.iterparse(f, events=('start-ns', )):
        print (event, elem)

可以工作,但什么也没打印-将xml更改为带有名称空间的xml以获得输出:

t = """<events version="1.0" xmlns:k="some_namespace">
    <event time="10998.0" type="actend" person="1" link="link36" actType="home"  />
    <event time="10998.0" type="departure" person="1" link="link36" legMode="car"  />
    <event time="10998.0" type="PersonEntersVehicle" person="1" vehicle="1"  />
</events>"""

输出:

start-ns ('k', 'some_namespace')