无法在python中解析url xml

时间:2017-08-02 18:14:07

标签: python xml xml-parsing

我已经尝试了几个小时使用python从url解析这个示例xml,但我无法提取定义。这是样本的样子

<entry_list version="1.0">
  <entry id="polycystic kidney disease">
    <ew>polycystic kidney disease</ew>
    <hw>polycystic kidney disease</hw>
    <fl>noun</fl>
    <def>
      <sensb>
        <sens>
          <dt> Blah blah blah
          </dt>
        </sens>
      </sensb>
    </def>
  </entry>
</entry_list>

我试图访问&#39; dt&#39;标签,因为这是我的定义。这是包含xml的url的简短版本。你们有人可以帮助我吗?

2 个答案:

答案 0 :(得分:0)

这对你有用

将xml.etree.ElementTree导入为ET

data = '''
<entry_list version="1.0">
  <entry id="polycystic kidney disease">
    <ew>polycystic kidney disease</ew>
    <hw>polycystic kidney disease</hw>
    <fl>noun</fl>
    <def>
      <sensb>
        <sens>
          <dt> Blah blah blah
          </dt>
        </sens>
      </sensb>
    </def>
  </entry>
</entry_list>'''

flag = ET.fromstring(data)
print flag.find('entry/def/sensb/sens/dt').text

答案 1 :(得分:0)

如果你安装BeautifulSoup,这样的东西应该可以工作

from bs4 import BeautifulSoup

xml = '''<entry_list version="1.0">
  <entry id="polycystic kidney disease">
    <ew>polycystic kidney disease</ew>
    <hw>polycystic kidney disease</hw>
    <fl>noun</fl>
    <def>
      <sensb>
        <sens>
          <dt> Blah blah blah
          </dt>
        </sens>
      </sensb>
    </def>
  </entry>
</entry_list>'''

parsed = BeautifulSoup(xml)

for dt in parsed.findAll("dt"):
    print dt.contents