给出以下xml:
<language>en-US</language>
<provider>VenturesLLC</provider>
<video>
<original_spoken_locale>en-US</original_spoken_locale>
<vendor_offer_code>TEST_VENDOR</vendor_offer_code>
<release_date>2011-01-15</release_date>
<title>Moving Forward</title>
<vendor_id>ASDF_ING_2012</vendor_id>
</video>
我希望检索整个 <video>
块。但是,当我这样做时:
>>> f=open('metadata.xml')
>>> contents=f.read()
>>> node=etree.fromstring(contents)
>>> node.xpath("//*[local-name()='video']")[0].text
'\n
请注意,如果我执行node.xpath("//*[local-name()='original_spoken_locale']")[0].text
之类的操作,则会获得'en-US'
的正确值。我如何得到这个完整的文本,以便我得到:
text = """
<video>
<original_spoken_locale>en-US</original_spoken_locale>
<vendor_offer_code>TEST_VENDOR</vendor_offer_code>
<release_date>2011-01-15</release_date>
<title>Moving Forward</title>
<vendor_id>ASDF_ING_2012</vendor_id>
</video>"""
答案 0 :(得分:2)
您的.text
电话无效,因为您的视频节点没有文字 - 它有其他节点子节目。您需要使用tostring
In [1]: from lxml import etree
In [2]: xml = '''<xml>
...: <language>en-US</language>
...: <provider>VenturesLLC</provider>
...: <video>
...: <original_spoken_locale>en-US</original_spoken_locale>
...: <vendor_offer_code>TEST_VENDOR</vendor_offer_code>
...: <release_date>2011-01-15</release_date>
...: <title>Moving Forward</title>
...: <vendor_id>ASDF_ING_2012</vendor_id>
...: </video></xml>'''
In [3]: tree = etree.fromstring(xml)
In [4]: vid = tree.xpath('//video')[0]
In [5]: etree.tostring(vid, pretty_print=True)
Out[5]: '<video>\n <original_spoken_locale>en-US</original_spoken_locale>\n <vendor_offer_code>TEST_VENDOR</vendor_offer_code>\n <release_date>2011-01-15</release_date>\n <title>Moving Forward</title>\n <vendor_id>ASDF_ING_2012</vendor_id>\n</video>\n'
In [6]: print _
<video>
<original_spoken_locale>en-US</original_spoken_locale>
<vendor_offer_code>TEST_VENDOR</vendor_offer_code>
<release_date>2011-01-15</release_date>
<title>Moving Forward</title>
<vendor_id>ASDF_ING_2012</vendor_id>
</video>