Question

我正在尝试从下面的XML文件中获取数据。对于每种类型，解释应紧挨着它。

例如：

橙色它们属于柑橘类。它们不能在低于温度下生长柠檬它们属于柑橘。它们不能在低于

的温度下生长

<Fruits>
    <Fruit>
        <Family>Citrus</Family>
        <Explanation>They belong to the Citrus.They cannot grow at a temperature below</Explanation>
        <Type>Orange</Type>
        <Type>Lemon</Type>
        <Type>Lime</Type>
        <Type>Grapefruit</Type>
    </Fruit>
        <Fruit>
        <Family>Pomes</Family>
        <Type>Apple</Type>
        <Type>Pear</Type>        
    </Fruit>
</Fruits>

这适用于下面的代码。然而，对于第二个Fruit Family我有一个问题，因为没有解释。

import os
from xml.etree import ElementTree
file_name = "example.xml"
full_file = os.path.abspath(os.path.join("xml", file_name))
dom = ElementTree.parse(full_file)
Fruit = dom.findall("Fruit")

for f in Fruit:
    Explanation = f.find("Explanation").text
    Types = f.findall("Type")
    for t in Types:
       Type = t.text
       print ("{0}, {1}".format(Type, Explanation))

如果缺少属性说明，我怎么能跳过像Fruit Family（Pomes）这样的标签？

Answer 1

使用 xml.etree ，只需尝试找到说明子项：

from  xml.etree import ElementTree as et
root = et.fromstring(xml)

for node in root.iter("Fruit"):
    if node.find("Explanation") is not None:
        print(node.find("Family").text)

您也可以使用xpath，只有在使用lxml的说明子项时才能获得Fruit节点：

import lxml.etree as et

root = et.fromstring(xml)

for node in root.xpath("//Fruit[Explanation]"):
     print(node.xpath("Family/text()"))

如果我们在你的样品上运行它，你会发现我们只是得到柑橘：

In [1]: xml = """<Fruits>
   ...:     <Fruit>
   ...:         <Family>Citrus</Family>
   ...:         <Explanation>They belong to the Citrus.They cannot grow at a temperature below</Explanation>
   ...:         <Type>Orange</Type>
   ...:         <Type>Lemon</Type>
   ...:         <Type>Lime</Type>
   ...:         <Type>Grapefruit</Type>
   ...:     </Fruit>
   ...:         <Fruit>
   ...:         <Family>Pomes</Family>
   ...:         <Type>Apple</Type>
   ...:         <Type>Pear</Type>
   ...:     </Fruit>
   ...: </Fruits>"""


In [2]: import lxml.etree as et

In [3]: root = et.fromstring(xml)

In [4]: for node in root.xpath("//Fruit[Explanation]"):
   ...:         print(node.xpath("Family/text()"))
   ...:     
['Citrus']

如果缺少属性，则跳过XML标记

1 个答案: