用python解析xml,超过1深度

时间:2014-04-22 14:54:43

标签: python xml parsing

我有一个非常大的xml文件,并希望根据childnode文本获取一些记录。让我们看看我有一个xml以下,我想得到价格值,如果项目味道好。 (好) 我尝试使用minidom和ET.ElementTree但找不到合适的方法。

我想做那样的事情;

from xml.dom.minidom import parse, parseString
dom = parse( "file.xml" )
for node in dom.getElementsByTagName('food'):
    node_child=node.getAttribute('description')
       taste=node_child.getAttribute('taste')
       if taste=='good':
          price=node.getAttribute('price')

有什么想法吗?

<breakfast_menu>
 <food>
  <name>Belgian Waffles</name>
  <price>$5.95</price>
  <description>
   <taste>good</taste>
   <sight>bad</sight>
 </description>
 <calories>650</calories>
</food>
<food>
 <name>Strawberry Belgian Waffles</name>
 <price>$7.95</price>
 <description>
   <taste>bad</taste>
   <sight>bad</sight>
 </description>
 <calories>900</calories>
</food>
<food>
 <name>Berry-Berry Belgian Waffles</name>
 <price>$8.95</price>
 <description>
  <taste>good</taste>
  <sight>good</sight>
 </description>
 <calories>900</calories>
</food>
<food>
 <name>French Toast</name>
 <price>$4.50</price>
 <description>
   <taste>good</taste>
   <sight>bad</sight>
 </description>
 <calories>600</calories>
</food>

3 个答案:

答案 0 :(得分:1)

您可以使用lxml来解析它。

<强>代码:

from lxml import html

data = """
    <breakfast_menu>
        <food>
            <name>Belgian Waffles</name>
            <price>$5.95</price>
            <description>
                <taste>good</taste>
                <sight>bad</sight>
            </description>
            <calories>650</calories>
        </food>
        <food>
            <name>Strawberry Belgian Waffles</name>
            <price>$7.95</price>
            <description>
                <taste>bad</taste>
                <sight>bad</sight>
            </description>
            <calories>900</calories>
        </food>
        <food>
            <name>Berry-Berry Belgian Waffles</name>
            <price>$8.95</price>
            <description>
                <taste>good</taste>
                <sight>good</sight>
            </description>
            <calories>900</calories>
        </food>
        <food>
            <name>French Toast</name>
            <price>$4.50</price>
            <description>
                <taste>good</taste>
                <sight>bad</sight>
            </description>
            <calories>600</calories>
        </food>
    """

tree = html.fromstring(data)
tastes = tree.xpath("//taste")
for taste in tastes:
    foodparent = taste.getparent().getparent()
    name = foodparent.xpath("name")[0].text 
    if taste.text == "good":
        price = foodparent.xpath("price")[0].text
        print "%s: %s" % (name, price)
    else:
        print "%s: %s" % (name, "Taste is bad, yuck.")

<强>结果:

Belgian Waffles: $5.95
Strawberry Belgian Waffles: Taste is bad, yuck.
Berry-Berry Belgian Waffles: $8.95
French Toast: $4.50
[Finished in 0.1s]

如果有帮助,请告诉我们。

答案 1 :(得分:0)

以下是使用ElementTree的解决方案

import xml.etree.ElementTree as et

tree = et.parse('breakfast.xml')
root = tree.getroot()
for food in root.findall('food'):
    if food.find('description').find('taste').text == 'good':
        price = food.find('price').text
        print "found good food:{0} at price {1}".format(food.find('name').text, price)

结果:

found good food:Belgian Waffles at price $5.95
found good food:Berry-Berry Belgian Waffles at price $8.95
found good food:French Toast at price $4.50

编辑:我还必须修复你的xml,因为你错过了结束标记

答案 2 :(得分:0)

假设您的xml存储在名为xml_string的字符串变量中,因此使用ElementTreeXPath,您可以选择包含的所有 food 元素带有品味元素的 description 元素,其值为“good”。然后,您可以从 food 元素中提取所需的任何信息。

from xml.etree import ElementTree

tree = ElementTree.fromstring(xml_string)

food_elements = tree.findall('.//food/description[taste="good"]/..')
prices = [(food.find('name').text, food.find('price').text) for food in food_elements]
print(prices)

打印出来:

[('Belgian Waffles', '$5.95'), ('Berry-Berry Belgian Waffles', '$8.95'), ('French Toast', '$4.50')]