用lxml解析xml块

时间:2012-06-22 18:53:28

标签: python xml lxml

给出以下xml:

<language>en-US</language>
<provider>VenturesLLC</provider>
<video>
    <original_spoken_locale>en-US</original_spoken_locale>
    <vendor_offer_code>TEST_VENDOR</vendor_offer_code>
    <release_date>2011-01-15</release_date>
    <title>Moving Forward</title>
    <vendor_id>ASDF_ING_2012</vendor_id>
</video>

我希望检索整个 <video>块。但是,当我这样做时:

>>> f=open('metadata.xml')
>>> contents=f.read()
>>> node=etree.fromstring(contents)
>>> node.xpath("//*[local-name()='video']")[0].text
'\n

请注意,如果我执行node.xpath("//*[local-name()='original_spoken_locale']")[0].text之类的操作,则会获得'en-US'的正确值。我如何得到这个完整的文本,以便我得到:

text = """    
<video>
    <original_spoken_locale>en-US</original_spoken_locale>
    <vendor_offer_code>TEST_VENDOR</vendor_offer_code>
    <release_date>2011-01-15</release_date>
    <title>Moving Forward</title>
    <vendor_id>ASDF_ING_2012</vendor_id>
</video>"""

1 个答案:

答案 0 :(得分:2)

您的.text电话无效,因为您的视频节点没有文字 - 它有其他节点子节目。您需要使用tostring

将这些节点转换为字符串
In [1]: from lxml import etree

In [2]: xml = '''<xml>
   ...: <language>en-US</language>
   ...: <provider>VenturesLLC</provider>
   ...: <video>
   ...:     <original_spoken_locale>en-US</original_spoken_locale>
   ...:     <vendor_offer_code>TEST_VENDOR</vendor_offer_code>
   ...:     <release_date>2011-01-15</release_date>
   ...:     <title>Moving Forward</title>
   ...:     <vendor_id>ASDF_ING_2012</vendor_id>
   ...: </video></xml>'''

In [3]: tree = etree.fromstring(xml)

In [4]: vid = tree.xpath('//video')[0]

In [5]: etree.tostring(vid, pretty_print=True)
Out[5]: '<video>\n  <original_spoken_locale>en-US</original_spoken_locale>\n  <vendor_offer_code>TEST_VENDOR</vendor_offer_code>\n  <release_date>2011-01-15</release_date>\n  <title>Moving Forward</title>\n  <vendor_id>ASDF_ING_2012</vendor_id>\n</video>\n'

In [6]: print _
<video>
  <original_spoken_locale>en-US</original_spoken_locale>
  <vendor_offer_code>TEST_VENDOR</vendor_offer_code>
  <release_date>2011-01-15</release_date>
  <title>Moving Forward</title>
  <vendor_id>ASDF_ING_2012</vendor_id>
</video>