如何使用python 3.2提取与XML文件属性相关的数据

时间:2012-06-16 09:11:40

标签: python xml-parsing

我有这种xml格式.....

<event timestamp="0.447463" bustype="LIN" channel="LIN 1">  
 <col name="Time"/>  
 <col name="Start of Frame">0.440708</col>  
 <col name="Channel">LIN 1</col>  
 <col name="Dir">Tx</col>  
 <col name="Event Type">LIN Frame (Diagnostic Request)</col>  
 <col name="Frame Name">MasterReq_DB</col>  
 <col name="Id">3C</col>  
 <col name="Data">81 06 04 04 FF FF 50 4C</col>  
 <col name="Publisher">TestMaster (simulated)</col>  
 <col name="Checksum">D3 &quot;Classic&quot;</col>  
 <col name="Header Duration">2.090 ms (40.1 bits)</col>  
 <col name="Resp. Duration">4.688 ms (90.0 bits)</col>  
 <col name="Time difference">0.049987</col>  
 <empty/>  
</event>  

在上面的xml中,我需要提取与属性“name”相关的数据 能够获取所有名称但无法获取&gt; MasterReq_DB&lt;场
请帮帮我...... 提前致谢

我的python代码是......

import sys 
import array
import string
from xml.dom.minidom import parse,parseString
from xml.dom import minidom                                              
input_file = open("test_input.txt",'r')                                                
alines = input_file.read()
word_lst = alines.split("'")
filename = word_lst[1]
pathname=word_lst[3]                                               
f = open(pathname,'r')
doc = minidom.parse(f)
node = doc.documentElement
events = doc.getElementsByTagName('event')
for event in events:
    #print (event)
    columns =  event.getElementsByTagName('col')
    for column in columns:
        #print (column)
        head = column.getAttribute('name')
        if (head == ('Frame Name')):
           print (head)
           request = head.firstChild.wholeText
           print (request)
print ("DOne")

1 个答案:

答案 0 :(得分:1)

如果您愿意,可以使用lxml来开始使用{<1}}。

In [1]: x = '''<event timestamp="0.447463" bustype="LIN" channel="LIN 1">  
   ...:  <col name="Time"/>  
   ...:  <col name="Start of Frame">0.440708</col>  
   ...:  <col name="Channel">LIN 1</col>  
   ...:  <col name="Dir">Tx</col>  
   ...:  <col name="Event Type">LIN Frame (Diagnostic Request)</col>  
   ...:  <col name="Frame Name">MasterReq_DB</col>  
   ...:  <col name="Id">3C</col>  
   ...:  <col name="Data">81 06 04 04 FF FF 50 4C</col>  
   ...:  <col name="Publisher">TestMaster (simulated)</col>  
   ...:  <col name="Checksum">D3 &quot;Classic&quot;</col>  
   ...:  <col name="Header Duration">2.090 ms (40.1 bits)</col>  
   ...:  <col name="Resp. Duration">4.688 ms (90.0 bits)</col>  
   ...:  <col name="Time difference">0.049987</col>  
   ...:  <empty/>  
   ...: </event> '''

In [2]: from lxml import etree

In [3]: tree = etree.fromstring(x)

In [4]: [elem.text for elem in tree.xpath('//*[@name]')]
Out[4]: 
[None,
 '0.440708',
 'LIN 1',
 'Tx',
 'LIN Frame (Diagnostic Request)',
 'MasterReq_DB',
 '3C',
 '81 06 04 04 FF FF 50 4C',
 'TestMaster (simulated)',
 'D3 "Classic"',
 '2.090 ms (40.1 bits)',
 '4.688 ms (90.0 bits)',
 '0.049987']

In [5]: [name for name in tree.xpath('//@name')]
Out[5]: 
['Time',
 'Start of Frame',
 'Channel',
 'Dir',
 'Event Type',
 'Frame Name',
 'Id',
 'Data',
 'Publisher',
 'Checksum',
 'Header Duration',
 'Resp. Duration',
 'Time difference']

要从文件而不是字符串中读取,请使用lxml.etree.parse函数。

这是lxml tutorial的链接。这是XPath syntax的参考。