ParserError:格式不正确(无效的令牌):

时间:2019-01-09 15:01:11

标签: python xml python-3.x xml-parsing

我有一个很大的xml文件(> 1,5gb),如下所示:

<?xml version="1.0" encoding="utf-8"?>
<events version="1.0">
    <event time="0.0" type="actend" person="94324001" link="119380" actType="home_94200.0"  />
    <event time="0.0" type="departure" person="94324001" link="119380" legMode="bicycle"  />
    <event time="0.0" type="actend" person="93120501" link="116274" actType="home_94800.0"  />
    <event time="0.0" type="departure" person="93120501" link="116274" legMode="bicycle"  />
    <event time="0.0" type="actend" person="84637601" link="72152" actType="home_90600.0"  />
    <event time="0.0" type="departure" person="84637601" link="72152" legMode="ride"  />
    <event time="0.0" type="actend" person="78914201" link="49600" actType="home_91800.0"  />
    <event time="0.0" type="departure" person="78914201" link="49600" legMode="access_walk"  />
    <event time="0.0" type="actend" person="74265301" link="48593" actType="home_96000.0"  />
    ....
</events>

当我尝试使用以下代码进行解析时:

import xml.etree.ElementTree as ET
import gzip
# Parsing Event XML and saving in a list
def gzipedXMLparser(filename):

    vehicleIDs = []
    data = gzip.open(filename, mode="rb")

    datatoparse = ET.iterparse(filename, events = ("start", "end"), parser = ET.XMLParser(encoding = 'utf-8'))
    datatoparse = iter(datatoparse)
    event, root = datatoparse.__next__()

    for event, elem in datatoparse:
        if event == "end" and elem.tag == "event":
            if elem.attrib["type"] == "vehicle enters traffic":
                if elem.attrib["vehicle"] in vehicleIDs:
                    pass
                else:
                    vehicleIDs.append(elem.attrib)
            elem.clear

            root.clear()
    print(vehicleIDs)
    return vehicleIDs

我收到以下错误消息:

xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 0

有人可以解释问题是什么以及如何解决吗?

问题是xml文件,某处是一个错误,我再次从另一个位置下载了它,并且工作正常。

2 个答案:

答案 0 :(得分:0)

似乎您的XML可能包含一些无效字符。 无论如何,您都可以检查ParseError: not well-formed (invalid token) using cElementTree

答案 1 :(得分:0)

问题是xml文件,某处是一个错误,我再次从另一个位置下载了它,并且工作正常。