如何通过lxml解析文件XML,获取元素&属性?

时间:2011-11-16 17:19:15

标签: python xml lxml elementtree

我有这样的xml描述:

    <Car xmlns="http://example.com/vocab/xml/cars#">
     <dateStarted>{{date_started|escape}}</dateStarted>
     <dateSold>{{date_sold|escape}}</dateSold>
    <name type="{{name_type}}" abbrev="{{name_abbrev}}" value="{{name_value}}" >{{name|escape}}</name>
    <brandName type="{{brand_name_type}}" abbrev="{{brand_name_abbrev}}" value="{{brand_name_value}}" >{{brand_name|escape}}</brandName>
      <maxspeed>
        <value>{{speed_value}}</value>
        <unit type="{{speed_unit_type}}" value="{{speed_unit_value}}" abbrev="{{speed_unit_abbrev}}" />
      </maxspeed>
      <route type="{{route_type}}" value="{{route_value}}" abbrev="{{route_abbrev}}">{{by_route|escape}}</route>
      <power>
        <value>{{strength_value|escape}}</value>
        <unit type="{{strength_unit_type}}" value="{{ strength_unit_value }}" abbrev="{{ strength_unit_abbrev }}" />
      </power>
      <frequency type="{{ frequency_type }}" value="{{frequency_value}}" abbrev="{{ frequency_abbrev }}">{{frequency|escape}}</frequency>  
    </Car>

我使用Python编写一个函数parse_car,使用上面的格式从字符串解析:

    def parse_car(etree):
        NS = "{http://example.com/vocab/xml/cars#}"
        CODES_NS = "{http://example.com/codes/}"


        return {'date_started' : etree.findtext('%sdateStarted' % NS),
        'date_stopped' : etree.findtext('%sdateStopped' % NS),
        'name': etree.findtext('%sname' % NS),
        'brand_name': etree.findtext('%sbrandName' % NS),
        'maxspeed': etree.findtext('%smaxspeed/value' % NS),
        'maxspeed_unit': etree.findtext('%smaxspeed/value' % NS).get('abbrev'),
        'route': etree.findtext('%sroute' % NS),
        'power': etree.findtext('%spower/value' % NS),
        'power_unit': etree.find('%spower/value' % NS).get('abbrev'),
        'frequency': etree.findtext('%sfrequency' % NS) }

但我只得到一部分结果。这是:它停在路线上:

    <Car xmlns="http://example.com/vocab/xml/cars#">
     <dateStarted>2011-02-05</dateStarted>
     <dateStopped>2011-02-13</dateStopped>         
    <name type="http://example.com/codes/bmw#" abbrev="X6" value="BMW X6" >BMW X6</name>
    <brandName type="http://example.com/codes/bmw#" abbrev="BMW" value="BMW" >BMW</brandName>
      <maxspeed>
        <value>250</value>
        <unit type="http://example.com/codes/units#" value="miles" abbrev="mph" />
      </maxspeed>
      <route type="http://...'

这是预期的最终结果:

    <Car xmlns="http://example.com/vocab/xml/cars#">
     <dateStarted>2011-02-05</dateStarted>
     <dateSold>2011-02-13</dateSold>
    <name type="http://example.com/codes/bmw#" abbrev="X6" value="BMW X6" >BMW X6</name>
    <brandName type="http://example.com/codes/bmw#" abbrev="BMW" value="BMW">BMW</brandName>
      <maxspeed>
        <value>250</value>
        <unit type="http://example.com/codes/units#" value="miles" abbrev="mph" />
      </maxspeed>
      <route type="http://example.com/codes/routes#" abbrev="HW" value="Highway" >Highway</route>
      <power>
        <value>{{strength_value|escape}}</value>
        <unit type="http://example.com/codes/units#" value="powerhorse" abbrev="ph" />
      </power>
      <frequency type="http://example.com/codes/frequency#" value="daily" >Daily</frequency>  
    </Car>

你能不能给我一些建议,为什么它不起作用?我在这里想念一些东西吗? 非常感谢你!

0 个答案:

没有答案