使用基于关键字搜索的python从xml文件中提取值

时间:2017-04-24 13:51:29

标签: python xml parsing

我使用Python ElementTree / lxml(pydev)解析xml文件。

编辑:完整的xml文件:

[https://pastebin.com/embed_js/Gbrv9wgG]

我正在尝试提取包含' ROTARY'的所有信号名称。作为评论中的关键字。 XML文件包含更多带有或没有' CHANNEL'的PNIODEV'子。

目前,我已将所有评论打印出

import xml.etree.ElementTree as ET
tree=ET.parse('Project.xml')
root=tree.getroot()
for comments in root.iter('COMMENT')
  print(comments.text)

我无法使用lxml或elementtree仅搜索关键字' ROTARY'在所有评论和打印相应的信号名称。  我使用了以下代码:

for word in root.xpath('.//CHANNEL[COMMENT[contains(text(),"ROTARY")]]"/COMMENT/text()'):
print (word)

没有得到任何输出......

由于我是Python和XML的新手,所以任何帮助都将受到高度赞赏。

4 个答案:

答案 0 :(得分:0)

作为xml.etree.ElementTree的替代方案,您可以使用BeautifulSoup来解析XML内容。

此代码将:

  1. 使用XML内容
  2. 创建soup
  3. 搜索所有<CHANNEL></CHANNEL>代码
  4. 对于<CHANNEL>的每次出现,它会在'ROTARY'标记内搜索单词<COMMENT>
  5. 如果找到单词'ROTARY',则会在<SIGNALNAME>标记处打印该值。
  6. 示例代码:

    s = '''<PROJECT>
              <HARDWARE CONFIGURATION>
                 <PNIODEVICE>
                    <PNIOSLOT>               
                <CHANNEL>
                 <INDEX>2</INDEX>
                 <SUBADR>0</SUBADR>
                 <CHTYPE>E</CHTYPE>
                 <MASK>4</MASK>
                 <SIGNALNAME>ELE+S1-BGI51.2</SIGNALNAME>
                 <COMMENT>ROTARY TRANSFER RADIAL ALIGNMENT 00SWIV</COMMENT>
                 </CHANNEL>
                <CHANNEL>
                 <INDEX>3</INDEX>
                 <SUBADR>0</SUBADR>
                 <CHTYPE>E</CHTYPE>
                 <MASK>8</MASK>
                 <SIGNALNAME>ELE+S1-BGI51.3</SIGNALNAME>
                 <COMMENT>ROTARY TRANSFER RADIAL ALIGNMENT 1800SW</COMMENT>
                 </CHANNEL>
                <CHANNEL>
                 <INDEX>4</INDEX>
                 <SUBADR>0</SUBADR>
                 <CHTYPE>E</CHTYPE>
                 <MASK>10</MASK>
                 <SIGNALNAME>ELE+S1-BGI51.4_4C</SIGNALNAME>
                 <COMMENT>ROTARY TRANSFER TRANSPORT ARM RIGHT 00R</COMMENT>
                 </CHANNEL>
            </PNIOSLOT>
            </PNIODEV>
            </HARDWARE>
    </PROJECT>'''
    
    from bs4 import BeautifulSoup
    
    soup = BeautifulSoup(s, 'lxml')
    
    channel_tags = soup.find_all('channel')
    
    for channel in channel_tags:
        if 'ROTARY' in channel.comment.text:
            print(channel.signalname)
    

    <强>输出:

    <signalname>ELE+S1-BGI51.2</signalname>
    <signalname>ELE+S1-BGI51.3</signalname>
    <signalname>ELE+S1-BGI51.4_4C</signalname>
    

    修改

    您可以使用AttributeError声明绕过try/except

    for channel in channel_tags:
        try:
            if 'ROTARY' in channel.comment.text:
                print(channel.signalname)
        except:
            continue
    

答案 1 :(得分:0)

您只能使用Etree获取输出: 如上所述(我建议你阅读本文档) - https://docs.python.org/2/library/xml.etree.elementtree.html嵌套子项,我们可以通过索引访问特定的子节点

所以,你可以这样做:

for i in root[0][0][0]: # looping over CHANNELS
    if 'ROTARY' in i[5].text: # if 'ROTARY' is in COMMENT
        print i[4].text # print corresponding SIGNALNAME

答案 2 :(得分:0)

您的xml包含无效字符&,您可以将其替换为&amp;
修复xml后,您可以使用:

import xml.etree.ElementTree as ET
tree=ET.parse('xml_test.xml')
for channel in tree.findall('.//CHANNEL'):
    if channel.find('COMMENT') is not None:
        comment = channel.find('COMMENT')
        if comment.text is not None:
            if "ROTARY" in comment.text:
                print channel.find('SIGNALNAME').text

输出:

ELE+S1-BGI51.0_6C
ELE+S1-BGI51.1_6C
ELE+S1-BGI51.2
ELE+S1-BGI51.3
ELE+S1-BGI51.4_4C
ELE+S1-BGI51.5_4C
ELE+S1-BGI51.6
ELE+S1-BGI51.7
ELE+S1-BGI52.0
...

答案 3 :(得分:0)

使用XPATH

import xml.etree.ElementTree as ET
tree =ET.parse('Project.xml').getroot()
all_items = root.findall("HARDWARE/PNIODEVICE/PNIOSLOT/CHANNEL")
lines = [item.find('SIGNALNAME').text for item in all_items if 'ROTARY' in item.find('COMMENT').text]
print lines

已编辑:您必须指定该频道可能没有评论标记!

import xml.etree.ElementTree as ET
root =ET.parse('project.xml').getroot()
all_items = root.findall("HARDWARE/PNIODEVICE/PNIOSLOT/CHANNEL")
lines = [item.find('SIGNALNAME').text for item in all_items if item.find('COMMENT') is not None and 'ROTARY' in item.find('COMMENT').text]
print lines