我有以下xml文件(其中包含超过2 GB的数据):
<events version="1.0">
<event time="10998.0" type="actend" person="1" link="link36" actType="home" />
<event time="10998.0" type="departure" person="1" link="link36" legMode="car" />
<event time="10998.0" type="PersonEntersVehicle" person="1" vehicle="1" />
....
</events>
为了读取和分析数据,我尝试使用这种方法:http://boscoh.com/programming/reading-xml-serially.html
但是当我尝试命名空间时:
nsmap = {}
for event, elem in etree.iterparse(xmL, events=('start-ns')):
ns, url = elem
nsmap[ns] = url
print(nsmap)
发生错误:
Traceback (most recent call last):
File "<ipython-input-16-6baf583a11d5>", line 1, in <module>
runfile('C:/Codezeug/Pypy/01/PlayingAround.py', wdir='C:/Codezeug/Pypy/01')
File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 668, in runfile
execfile(filename, namespace)
File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 108, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Codezeug/Pypy/01/PlayingAround.py", line 22, in <module>
for event, elem in etree.iterparse(one, events=('start-ns')):
File "C:\Users\AppData\Local\Continuum\anaconda3\lib\xml\etree\ElementTree.py", line 1218, in iterparse
pullparser = XMLPullParser(events=events, _parser=parser)
File "C:\Users\AppData\Local\Continuum\anaconda3\lib\xml\etree\ElementTree.py", line 1261, in __init__
self._parser._setevents(self._events_queue, events)
ValueError: unknown event 's'
此代码如何工作以及为什么搜索“ s”?
答案 0 :(得分:0)
您需要提供一个元组
for event, elem in etree.iterparse(xmL, events=('start-ns',)): # added , to make it a tuple
否则,它将字符串解释为可迭代的,并分别尝试每个字符。
您的XML不包含名称空间:
t = """<events version="1.0">
<event time="10998.0" type="actend" person="1" link="link36" actType="home" />
<event time="10998.0" type="departure" person="1" link="link36" legMode="car" />
<event time="10998.0" type="PersonEntersVehicle" person="1" vehicle="1" />
</events>"""
with open("data.xml","w") as f: f.write(t)
import xml.etree.ElementTree as etree
with open("data.xml") as f:
for event, elem in etree.iterparse(f, events=('start-ns', )):
print (event, elem)
可以工作,但什么也没打印-将xml更改为带有名称空间的xml以获得输出:
t = """<events version="1.0" xmlns:k="some_namespace">
<event time="10998.0" type="actend" person="1" link="link36" actType="home" />
<event time="10998.0" type="departure" person="1" link="link36" legMode="car" />
<event time="10998.0" type="PersonEntersVehicle" person="1" vehicle="1" />
</events>"""
输出:
start-ns ('k', 'some_namespace')