我正在尝试解析大小超过1GB的XML文件,因此我使用 iterparse ,但我无法找到二级孩子。从下面的代码我能够得到elem的孩子而不是child1的孩子,即我无法进入child2循环
代码:
import xml.etree.cElementTree as ET
xmL = 'F:\\Reports\\Logs\\Result_TG1_V16.xml'
count = 0
flag =0
for event, elem in ET.iterparse(xmL,):
if event == 'end':
if elem.tag == 'TasksReportNode':
count += 1
for child1 in elem:
print(child1.tag, child1.text)
for child2 in child1:
print(child2.tag, child2.text)
elem.clear() # discard the element
print count
完整的XML文件 - > XML
<TasksReportNode Name="Task15">
<TableData NumRows="97" NumColumns="15">
<TableRow RowCount="0">
<TableColumn Name="Task"><![CDATA[ Task15 [GET - /PULSEV31/appView/projectFeedHidden.jsp - 200]]]></TableColumn>
<TableColumn Name="Status"><![CDATA[Success]]></TableColumn>
<TableColumn Name="Successful"><![CDATA[96]]></TableColumn>
<TableColumn Name="Failed"><![CDATA[0]]></TableColumn>
<TableColumn Name="Timedout"><![CDATA[0]]></TableColumn>
<TableColumn Name="Total"><![CDATA[96]]></TableColumn>
<TableColumn Name="Min(ms)"><![CDATA[15]]></TableColumn>
<TableColumn Name="Avg(ms)"><![CDATA[24.20]]></TableColumn>
<TableColumn Name="Avg-90%(ms)"><![CDATA[54.55]]></TableColumn>
<TableColumn Name="90%ile(ms)"><![CDATA[89.98]]></TableColumn>
<TableColumn Name="95%ile(ms)"><![CDATA[95.24]]></TableColumn>
<TableColumn Name="99%ile(ms)"><![CDATA[99.45]]></TableColumn>
<TableColumn Name="Max(ms)"><![CDATA[94]]></TableColumn>
<TableColumn Name="Std. Dev."><![CDATA[15.74]]></TableColumn>
<TableColumn Name="Bytes Recd(KB)"><![CDATA[192]]></TableColumn>
</TableRow>
</TableData>
<TableData NumRows="1" NumColumns="2">
<TableRow RowCount="0">
<TableColumn Name="Response Time Interval (ms)"><![CDATA[0 - 99]]></TableColumn>
<TableColumn Name="Frequency"><![CDATA[96]]></TableColumn>
</TableRow>
</TableData>
</TasksReportNode>
<TasksReportNode Name="Task16">
<TableData NumRows="97" NumColumns="15">
<TableRow RowCount="0">
<TableColumn Name="Task"><![CDATA[ Task16 [GET - /PULSEV31/appView/projectCommentHidden.jsp - 200]]]></TableColumn>
<TableColumn Name="Status"><![CDATA[Success]]></TableColumn>
<TableColumn Name="Successful"><![CDATA[96]]></TableColumn>
<TableColumn Name="Failed"><![CDATA[0]]></TableColumn>
<TableColumn Name="Timedout"><![CDATA[0]]></TableColumn>
<TableColumn Name="Total"><![CDATA[96]]></TableColumn>
<TableColumn Name="Min(ms)"><![CDATA[15]]></TableColumn>
<TableColumn Name="Avg(ms)"><![CDATA[22.73]]></TableColumn>
<TableColumn Name="Avg-90%(ms)"><![CDATA[54.55]]></TableColumn>
<TableColumn Name="90%ile(ms)"><![CDATA[90.93]]></TableColumn>
<TableColumn Name="95%ile(ms)"><![CDATA[96.25]]></TableColumn>
<TableColumn Name="99%ile(ms)"><![CDATA[100.50]]></TableColumn>
<TableColumn Name="Max(ms)"><![CDATA[109]]></TableColumn>
<TableColumn Name="Std. Dev."><![CDATA[14.76]]></TableColumn>
<TableColumn Name="Bytes Recd(KB)"><![CDATA[192]]></TableColumn>
</TableRow>
</TableData>
</TasksReportNode>
答案 0 :(得分:0)
这是我尝试过的:我使用了lxml而不是cElementtree
from lxml import etree
xmL = 'F:\\Reports\\Logs\\Result_TG1_V16.xml'
context = etree.iterparse(xmL, events=("start", "end"),)
for event, element in context:
if element.tag == 'TasksReportNode':
for child1 in element:
for child2 in child1:
if child2.get("RowCount") == "0":
for child3 in child2:
print(child3.tag, child3.text)
element.clear() # discard the element
del context
我可以获取所有子标签和数据。