我使用Python ElementTree / lxml(pydev)解析xml文件。
编辑:完整的xml文件:
[https://pastebin.com/embed_js/Gbrv9wgG]
我正在尝试提取包含' ROTARY'的所有信号名称。作为评论中的关键字。 XML文件包含更多带有或没有' CHANNEL'的PNIODEV'子。
目前,我已将所有评论打印出
import xml.etree.ElementTree as ET
tree=ET.parse('Project.xml')
root=tree.getroot()
for comments in root.iter('COMMENT')
print(comments.text)
我无法使用lxml或elementtree仅搜索关键字' ROTARY'在所有评论和打印相应的信号名称。 我使用了以下代码:
for word in root.xpath('.//CHANNEL[COMMENT[contains(text(),"ROTARY")]]"/COMMENT/text()'):
print (word)
没有得到任何输出......
由于我是Python和XML的新手,所以任何帮助都将受到高度赞赏。
答案 0 :(得分:0)
作为xml.etree.ElementTree
的替代方案,您可以使用BeautifulSoup
来解析XML
内容。
此代码将:
soup
<CHANNEL></CHANNEL>
代码<CHANNEL>
的每次出现,它会在'ROTARY'
标记内搜索单词<COMMENT>
。'ROTARY'
,则会在<SIGNALNAME>
标记处打印该值。示例代码:
s = '''<PROJECT>
<HARDWARE CONFIGURATION>
<PNIODEVICE>
<PNIOSLOT>
<CHANNEL>
<INDEX>2</INDEX>
<SUBADR>0</SUBADR>
<CHTYPE>E</CHTYPE>
<MASK>4</MASK>
<SIGNALNAME>ELE+S1-BGI51.2</SIGNALNAME>
<COMMENT>ROTARY TRANSFER RADIAL ALIGNMENT 00SWIV</COMMENT>
</CHANNEL>
<CHANNEL>
<INDEX>3</INDEX>
<SUBADR>0</SUBADR>
<CHTYPE>E</CHTYPE>
<MASK>8</MASK>
<SIGNALNAME>ELE+S1-BGI51.3</SIGNALNAME>
<COMMENT>ROTARY TRANSFER RADIAL ALIGNMENT 1800SW</COMMENT>
</CHANNEL>
<CHANNEL>
<INDEX>4</INDEX>
<SUBADR>0</SUBADR>
<CHTYPE>E</CHTYPE>
<MASK>10</MASK>
<SIGNALNAME>ELE+S1-BGI51.4_4C</SIGNALNAME>
<COMMENT>ROTARY TRANSFER TRANSPORT ARM RIGHT 00R</COMMENT>
</CHANNEL>
</PNIOSLOT>
</PNIODEV>
</HARDWARE>
</PROJECT>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(s, 'lxml')
channel_tags = soup.find_all('channel')
for channel in channel_tags:
if 'ROTARY' in channel.comment.text:
print(channel.signalname)
<强>输出:强>
<signalname>ELE+S1-BGI51.2</signalname>
<signalname>ELE+S1-BGI51.3</signalname>
<signalname>ELE+S1-BGI51.4_4C</signalname>
修改强>
您可以使用AttributeError
声明绕过try/except
。
for channel in channel_tags:
try:
if 'ROTARY' in channel.comment.text:
print(channel.signalname)
except:
continue
答案 1 :(得分:0)
您只能使用Etree获取输出: 如上所述(我建议你阅读本文档) - https://docs.python.org/2/library/xml.etree.elementtree.html:嵌套子项,我们可以通过索引访问特定的子节点
所以,你可以这样做:
for i in root[0][0][0]: # looping over CHANNELS
if 'ROTARY' in i[5].text: # if 'ROTARY' is in COMMENT
print i[4].text # print corresponding SIGNALNAME
答案 2 :(得分:0)
您的xml包含无效字符&
,您可以将其替换为&
。
修复xml后,您可以使用:
import xml.etree.ElementTree as ET
tree=ET.parse('xml_test.xml')
for channel in tree.findall('.//CHANNEL'):
if channel.find('COMMENT') is not None:
comment = channel.find('COMMENT')
if comment.text is not None:
if "ROTARY" in comment.text:
print channel.find('SIGNALNAME').text
输出:
ELE+S1-BGI51.0_6C
ELE+S1-BGI51.1_6C
ELE+S1-BGI51.2
ELE+S1-BGI51.3
ELE+S1-BGI51.4_4C
ELE+S1-BGI51.5_4C
ELE+S1-BGI51.6
ELE+S1-BGI51.7
ELE+S1-BGI52.0
...
答案 3 :(得分:0)
使用XPATH
import xml.etree.ElementTree as ET
tree =ET.parse('Project.xml').getroot()
all_items = root.findall("HARDWARE/PNIODEVICE/PNIOSLOT/CHANNEL")
lines = [item.find('SIGNALNAME').text for item in all_items if 'ROTARY' in item.find('COMMENT').text]
print lines
已编辑:您必须指定该频道可能没有评论标记!
import xml.etree.ElementTree as ET
root =ET.parse('project.xml').getroot()
all_items = root.findall("HARDWARE/PNIODEVICE/PNIOSLOT/CHANNEL")
lines = [item.find('SIGNALNAME').text for item in all_items if item.find('COMMENT') is not None and 'ROTARY' in item.find('COMMENT').text]
print lines