我想解析具有标签主题作为父标签的xml字符串和Topic1,Topic2作为子标签。
<?xml version="1.0" encoding="UTF-8"?><SignificantDevelopments Major="3" Minor="0" Revision="1" xmlns="urn:reuterscompanycontent:significantdevelopments03"><Topics><Topic1 Code="254">Regulatory / Company Investigation</Topic1><Topic2 Code="207">Mergers & Acquisitions</Topic2><ParentTopic1 Code="6">Litigation / Regulatory</ParentTopic1><ParentTopic2 Code="4">Ownership / Control</ParentTopic2></Topics></SignificantDevelopments>
我只想解析这个xml,这样我就可以得到每个Topic标签的属性值,我只想让它进入for循环。
我尝试过使用以下代码:
import xml.etree.cElementTree as ET
tree = ET.ElementTree(file='sample.xml')
#get the root element
root = tree.getroot()
namespace = {'xmlns': 'urn:reuterscompanycontent:significantdevelopments03'}
for devs in root.findall('xmlns:Topics' ,namespace):
for child_tags in devs.findall('xmlns:./', namespace):
print 'child: ', child_tags.tag
我只想在倒数第二行添加一些像Topic / d这样的外卡,这样我就可以解析每个匹配主题的标签
答案 0 :(得分:1)
您可以检查tag
属性是否以命名空间加上前缀Topic
开头,例如
from xml.etree import cElementTree as ET
root = ET.fromstring('<?xml version="1.0" encoding="UTF-8"?><SignificantDevelopments Major="3" Minor="0" Revision="1" xmlns="urn:reuterscompanycontent:significantdevelopments03"><Topics><Topic1 Code="254">Regulatory / Company Investigation</Topic1><Topic2 Code="207">Mergers & Acquisitions</Topic2><ParentTopic1 Code="6">Litigation / Regulatory</ParentTopic1><ParentTopic2 Code="4">Ownership / Control</ParentTopic2></Topics></SignificantDevelopments>')
topics = [el for el in root.findall('*/*') if el.tag.startswith('{urn:reuterscompanycontent:significantdevelopments03}Topic')]
for topic in topics:
print (topic.text)
或更短
from xml.etree import cElementTree as ET
root = ET.fromstring('<?xml version="1.0" encoding="UTF-8"?><SignificantDevelopments Major="3" Minor="0" Revision="1" xmlns="urn:reuterscompanycontent:significantdevelopments03"><Topics><Topic1 Code="254">Regulatory / Company Investigation</Topic1><Topic2 Code="207">Mergers & Acquisitions</Topic2><ParentTopic1 Code="6">Litigation / Regulatory</ParentTopic1><ParentTopic2 Code="4">Ownership / Control</ParentTopic2></Topics></SignificantDevelopments>')
for topic in [el for el in root.findall('*/*') if el.tag.startswith('{urn:reuterscompanycontent:significantdevelopments03}Topic')]:
print (topic.text)
或者将支票放入if
语句中的for
语句。