Question

我有一个名为Artists.xml的以下XML文件，其中包含以下几位艺术家的信息：

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Artists>
<Singer name="Britney">
    <Albums>7</Albums>
    <Country>USA</County>
    <Last Single>  Piece of Me
      <Year>2011</Year>
   </Last Single>
</Singer>
<Singer name="Justin">
    <Albums>8</Albums>
    <Country>USA</County>
    <Last Single> Rock Your Body
      <Year>2004</Year>
   </Last Single>
</Singer>
</Artsts>

我正在使用Python库ElementTree来提取所有标签的内容。到目前为止，这是我编写的Python代码：

from xml.etree import cElementTree as ET
tree = ET.parse('Artists.xml')
root = tree.getroot()
for child in root:
    for content in child:
       print(child[content].text)

尽管如此，当我运行脚本时，在控制台中看不到任何输入。我希望看到类似7 USA Piece of Me 2011, 8 USA Rock Your Body 2004.的内容，有人可以帮助我了解我在做什么错吗？预先感谢！

Answer 1

使用xml.etree.ElementTree

test.xml：

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Artists>
    <Singer name="Britney">
        <Albums>7</Albums>
        <Country>USA</Country>
        <LastSingle>
               Piece of Me
              <Year>2011</Year>
       </LastSingle>
    </Singer>
    <Singer name="Justin">
        <Albums>8</Albums>
        <Country>USA</Country>
        <LastSingle> Rock Your Body
          <Year>2004</Year>
       </LastSingle>
    </Singer>
</Artists>

因此：

from xml.etree import ElementTree
tree = ElementTree.parse('test.xml')
root = tree.getroot()
results = root.findall('Singer')

for elem in results:
    for e in elem:
        print(e.text.strip())

输出：

7
USA
Piece of Me
8
USA
Rock Your Body

Process finished with exit code 0

Answer 2

一种通用方法。将XML转换为字典并打印字典。（文件55726013.xml包含您的样本数据）。如您所见，代码对XML结构的知识为零。

import xmltodict
import json

with open('55726013.xml') as fd:
    doc = xmltodict.parse(fd.read())

print(json.dumps(doc, indent=4))

输出

{
    "Artists": {
        "Singer": [
            {
                "@name": "Britney", 
                "Albums": "7", 
                "Country": "USA", 
                "LastSingle": {
                    "Year": "2011", 
                    "#text": "Piece of Me"
                }
            }, 
            {
                "@name": "Justin", 
                "Albums": "8", 
                "Country": "USA", 
                "LastSingle": {
                    "Year": "2004", 
                    "#text": "Rock Your Body"
                }
            }
        ]
    }
}

如何使用Python和ElementTree从XML文件的所有元素中提取所有内容？

2 个答案: