我在Python中使用请求模块获得响应,响应采用xml格式。我想解析它并从每个'dt'标签中获取详细信息。我无法使用lxml来做到这一点。
这是xml响应:
<?xml version="1.0" encoding="utf-8" ?>
<entry_list version="1.0">
<entry id="harsh">
<ew>harsh</ew><subj>MD-2</subj><hw>harsh</hw>
<sound><wav>harsh001.wav</wav><wpr>!h@rsh</wpr></sound>
<pr>ˈhärsh</pr>
<fl>adjective</fl>
<et>Middle English <it>harsk,</it> of Scandinavian origin; akin to Norwegian <it>harsk</it> harsh</et>
<def>
<date>14th century</date>
<sn>1</sn>
<dt>:having a coarse uneven surface that is rough or unpleasant to the touch</dt>
<sn>2 a</sn>
<dt>:causing a disagreeable or painful sensory reaction :<sx>irritating</sx></dt>
<sn>b</sn>
<dt>:physically discomforting :<sx>painful</sx></dt>
<sn>3</sn>
<dt>:unduly exacting :<sx>severe</sx></dt>
<sn>4</sn>
<dt>:lacking in aesthetic appeal or refinement :<sx>crude</sx></dt>
<ss>rough</ss>
</def>
<uro><ure>harsh*ly</ure> <fl>adverb</fl></uro>
<uro><ure>harsh*ness</ure> <fl>noun</fl></uro>
</entry>
</entry_list>
答案 0 :(得分:1)
一种简单的方法是遍历xml文档的层次结构。
import requests
from lxml import etree
re = requests.get(url)
root = etree.fromstring(re.content)
print(root.xpath('//entry_list/entry/def/dt/text()'))
这将为xml文档中的每个'dt'标记提供文本值。
答案 1 :(得分:0)
from xml.dom import minidom
# List with dt values
dt_elems = []
# Process xml getting elements by tag name
xmldoc = minidom.parse('text.xml')
itemlist = xmldoc.getElementsByTagName('dt')
# Get the values
for i in itemlist:
dt_elems.append(" ".join(t.nodeValue for t in i.childNodes if t.nodeType==t.TEXT_NODE))
# Print the list result
print dt_elems