访问具有异常xml结构的文件夹中的xml文件python

时间:2017-06-24 18:55:34

标签: python xml parsing

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><document DateTime="2017-06-23T04:27:08.592Z"><PeakInfo No="1" mz="505.2315648572003965" Intensity="4531.0000000000000000" Rel_Intensity="3.2737729673489735" Resolution="1879.5638812957554364" SNR="14.0278637770897561" Area="1348.1007591467391649" Rel_Area="2.3371194184605959" Index="238.9999999999976694"/><PeakInfo No="2" mz="522.1330917856538463" Intensity="3382.0000000000000000" Rel_Intensity="2.4435886505350317" Resolution="3502.9921209527169594" SNR="10.4705882352940982" Area="881.4468100654634100" Rel_Area="1.5281101521284057" Index="925.0000000000000000"/></document>

以上是我需要解析的xml文件的一部分。我查看了一些关于如何解析/提取xml文件的YouTube视频,以及它们所覆盖的内容由于某种原因似乎不适用于我的xml文件。我知道如果我没有弄错的话,这些PeakInfo就是元素。但是,我似乎无法访问每个PeakInfo no。的mz和Intensity值的值。

import xml.etree.ElementTree as ET
import os

file_name = 'E7.xml'
full_file = os.path.abspath(os.path.join('xmllist', file_name))

pl = ET.parse(full_file)

peakinfos = pl.findall('PeakInfo')

for p in peakinfos:
    mz = p.find('mz')
    print(mz)

以上是我根据一些YouTube视频编写的代码。在这里,我尝试从PeakInfo元素访问mz值,但无济于事。我有什么办法可以实现我的目标吗?

编辑: print(pl)结果为:xml.etree.ElementTree.ElementTree对象

1 个答案:

答案 0 :(得分:1)

s = '''<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
       <document DateTime="2017-06-23T04:27:08.592Z">
           <PeakInfo No="1" mz="505.2315648572003965"
                     Intensity="4531.0000000000000000"
                     Rel_Intensity="3.2737729673489735"
                     Resolution="1879.5638812957554364"
                     SNR="14.0278637770897561"
                     Area="1348.1007591467391649"
                     Rel_Area="2.3371194184605959"
                     Index="238.9999999999976694"/>
           <PeakInfo No="2" mz="522.1330917856538463"
                     Intensity="3382.0000000000000000"
                     Rel_Intensity="2.4435886505350317"
                     Resolution="3502.9921209527169594"
                     SNR="10.4705882352940982"
                     Area="881.4468100654634100"
                     Rel_Area="1.5281101521284057"
                     Index="925.0000000000000000"/>
       </document>'''

import xml.etree.ElementTree as ET

root = ET.fromstring(s)
peakinfos = root.findall('PeakInfo')

findall正在寻找元素,您正在尝试访问元素属性 使用attribget访问值。

for p in peakinfos:
    print 'mz is ...', p.get('mz')
    print 'mz is ...', p.attrib['mz']
    for k,v in p.items():
        print '{}: {}'.format(k,v)
    print '--------------------------'