我需要使用Python提取XML文档中属性的值。
例如,如果我有这样的XML文档:
<xml>
<child type = "smallHuman"/>
<adult type = "largeHuman"/>
</xml>
我如何才能将'smallHuman'或'largeHuman'文本存储在变量中?
编辑:我对Python很陌生,可能需要很多帮助。
这是我到目前为止所尝试的:
#! /usr/bin/python
import xml.etree.ElementTree as ET
def walkTree(node):
print node.tag
print node.keys()
print node.attributes[]
for cn in list(node):
walkTree(cn)
treeOne = ET.parse('tm1.xml')
treeTwo = ET.parse('tm3.xml')
walkTree(treeOne.getroot())
由于这个脚本的使用方式,我无法将XML硬编码到.py文件中。
答案 0 :(得分:2)
使用ElementTree,您可以使用查找方法&amp; attrib 。
示例:强>
import xml.etree.ElementTree as ET
z = """<xml>
<child type = "smallHuman"/>
<adult type = "largeHuman"/>
</xml>"""
treeOne = ET.fromstring(z)
print treeOne.find('./child').attrib['type']
print treeOne.find('./adult').attrib['type']
<强>输出:强>
smallHuman
largeHuman
答案 1 :(得分:2)
要从XML获取属性值,您可以这样做:
public class kafkaConnection2 {
public static void main(String[] args) {
String URL = "spark://XXXXXXXXX";
SparkConf conf = new SparkConf().setAppName("Kafka-test").setMaster(URL);
JavaStreamingContext ssc = new JavaStreamingContext(conf, Durations.seconds(5));
Map<String, Object> kafkaParams = new HashMap<>();
kafkaParams.put("bootstrap.servers", "XXXXXXXX");
kafkaParams.put("key.deserializer", StringDeserializer.class);
kafkaParams.put("value.deserializer", StringDeserializer.class);
kafkaParams.put("group.id", "ID1");
Collection<String> topics = Arrays.asList("MAX_LEGO");
JavaInputDStream<ConsumerRecord<String, String>> stream = KafkaUtils.createDirectStream(ssc,
LocationStrategies.PreferConsistent(),
ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams));
stream.mapToPair(record -> new Tuple2<>(record.key(), record.value()));
stream.count();
stream.map()
ssc.start();
try {
ssc.awaitTerminationOrTimeout(10000);
} catch (InterruptedException e) {
System.out.println("smth went terribly wrong");
e.printStackTrace();
}
}
}
您可以在以下链接中找到更多详细信息和示例: https://docs.python.org/3.5/library/xml.etree.elementtree.html
答案 2 :(得分:0)
使用lxml库的另一个示例:
admin.user.has_perm()
答案 3 :(得分:0)
使用SimplifiedDoc库的另一个示例:
from simplified_scrapy import SimplifiedDoc, utils
xml = '''<xml>
<child type = "smallHuman"/>
<adult type = "largeHuman"/>
</xml>'''
doc = SimplifiedDoc(xml).select('xml')
# first
child_type = doc.child['type']
print(child_type)
adult_type = doc.adult['type']
print(adult_type)
# second
child_type = doc.select('child').get('type')
adult_type = doc.select('adult').get('type')
print(child_type)
print(adult_type)
# second
child_type = doc.select('child>type()')
adult_type = doc.select('adult>type()')
print(child_type)
print(adult_type)
# third
nodes = doc.selects('child|adult>type()')
print(nodes)
# fourth
nodes = doc.children
print ([node['type'] for node in nodes])