在XML文件中提取项目并将其转换为Python中的dict

时间:2019-01-03 13:59:01

标签: python xml

有一个名为core-site.xml的文件

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/home/centos/hadoop_tmp/tmp</value>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://test:9000</value>
    </property>
</configuration>

我如何在python中得到这样的字典:

{'hadoop.tmp.dir': 'file:/home/centos/hadoop/tmp', 'fs.defaultFS': 'hdfs://test:9000'}

2 个答案:

答案 0 :(得分:2)

您应该使用ElementTree python库,该库可在此处找到: https://docs.python.org/2/library/xml.etree.elementtree.html

首先,您需要将.xml文件传递到ElementTree库

import xml.etree.ElementTree as ET
tree = ET.parse('core-site.xml')
root = tree.getroot()

完成后,您就可以开始使用root对象来解析XML文档

for property in root.findall('property'):

在此循环中,您可以开始从属性中提取名称和值

for entry in root.findall('property'):
    name = entry.find('name').text
    value = entry.find('value').text
    print(name)
    print(value)

您要将其添加到字典中,该字典应该很简单

configuration = dict()
for entry in root.findall('property'):
    name = entry.find('name').text
    value = entry.find('value').text
    configuration[name] = value

然后您应该拥有一个字典,其中包含所有XML配置

import xml.etree.ElementTree as ET
tree = ET.parse('core-site.xml')
root = tree.getroot()
configuration = dict()
for entry in root.findall('property'):
    name = entry.find('name').text
    value = entry.find('value').text
    configuration[name] = value
print(configuration)

答案 1 :(得分:0)

这个问题已经有了一个可以接受的答案,但是由于我对此进行了评论,所以我想举一个使用我建议的模块之一的示例。

xml = '''<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/home/centos/hadoop_tmp/tmp</value>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://test:9000</value>
    </property>
</configuration>'''

import xmltodict
# Load the xml string into a test object
test = xmltodict.parse(xml)
# Instantiate a temporary dictionary where we will store the parsed data
temp_dict = {}
# Time to parse the resulting structure
for name in test:
    # Check that we have the needed 'property' key before doing any processing on the leaf
    if 'property' in test[name].keys():
        # For each property leaf
        for property in test[name]['property']:
                # If the leaf has the stuff you need to save, print it
                if 'name' in property.keys():
                    print('Found name', property['name'])
                if 'value' in property.keys():
                    print('With value', property['value'])
                # And then save it to the temporary dictionary in the form you need
                # Do note that if you have duplicate "name" strings, only the last "value" will be saved
                temp_dict.update({property['name']: property['value']})

print(temp_dict)

这是输出

  

找到名称hadoop.tmp.dir

     

带有值文件:/ home / centos / hadoop_tmp / tmp

     

找到名称fs.defaultFS

     

具有值hdfs:// test:9000

     

{'hadoop.tmp.dir':'file:/ home / centos / hadoop_tmp / tmp','fs.defaultFS':'hdfs:// test:9000'}