通过xml.etree读取XML文件时出错

时间:2016-03-15 07:12:07

标签: python xml lxml xml.etree celementtree

我正在尝试使用xml.etree在python中读取XML文件,但有时对于某些文件,我在解析文件时会出现内存错误。我的XML文件大小是912Mb,问题是与文件大小有关吗?

代码:

Traceback (most recent call last):
File "<pyshell#3>", line 2, in <module>
tree = ElementTree.parse(f1)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1182, in parse
tree.parse(source, parser)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 653, in parse
data = source.read(65536)
MemoryError

错误:

 from lxml import etree
   context = etree.iterparse('F:\\Reports\\Logs\\AppPerfect_States\\TG1_GM\\Result_TG1_V16.xml',tag = "document")
   for event, element in context:
    for child in element:
        print child.tag, child.text
    element.clear()

更新: 根据许多建议,我尝试了lxml

代码:

C:\Python27\python.exe "F:/Py Projects/V16_AUTO/test1/xmlparsingtest1.py"
Traceback (most recent call last):
  File "F:/Py Projects/V16_AUTO/test1/xmlparsingtest1.py", line 3, in <module>
    for event, element in context:
  File "iterparse.pxi", line 207, in lxml.etree.iterparse.__next__ (src\lxml\lxml.etree.c:126137)
lxml.etree.XMLSyntaxError: unknown error, line 7530730, column 33

错误:

import xml.etree.cElementTree as etree
xmL = 'F:\\Reports\\Logs\\Result_TG1_V16.xml'
context = etree.iterparse(xmL, events=("start", "end"))
context = iter(context)
event, root = context.next()
for event, elem in context:
    if event == 'TasksReportNode':
        print elem.tag
        print elem.text
        root.clear()

UPDATE2: 尝试过cElementTree

代码:

Exception MemoryError:  in  ignored
Exception MemoryError:  in  ignored
Exception MemoryError:  in  ignored
Exception MemoryError:  in  ignored
Exception MemoryError:  in  ignored
MemoryError

错误:

select ename, 
length(ename)-length(replace(ename,'A', '')) A,
length(ename)-length(replace(ename,'W', '')) W 
FROM EMP;

2 个答案:

答案 0 :(得分:0)

import xml.etree.ElementTree as ET
tree = ET.ElementTree(file="xyz.xml")

for elem in tree.iter():
    print elem.attrib

尝试使用此代码来读取您的文件。这可能有所帮助。

答案 1 :(得分:0)

以下是我尝试的内容:我使用过lxml

from lxml import etree
xmL = 'F:\\Reports\\Logs\\Result_TG1_V16.xml'


context = etree.iterparse(xmL,  events=("start", "end"),)
for event, element in context:
if element.tag == 'TasksReportNode':
    for child1 in element:
        for child2 in child1:
        if child2.get("RowCount") == "0":
            for child3 in child2:
            print(child3.tag, child3.text)
element.clear() # discard the element
del context

我能够解析所有标签并检索所需的数据。