根据特定关键字使用python从XML文件中提取值

时间:2018-08-01 14:27:52

标签: xml python-3.x

如果关键字“心血管”存在于节点之一中,我想提取不同节点的值。这是我的代码:

import xml.etree.ElementTree as ET
tree=ET.parse(r'G:\My Drive\dru.xml')
for channel in tree.findall('.//drug'):
    if channel.find('indication') is not None:
        comment = channel.find('indication')
        if comment.text is not None:
            if "cardiovascular" in comment.text:
                print(channel.find('indication').text)
                print (channel.find('name').text)

我没有看到任何错误,但是也没有看到任何结果。尽管在xml文件中,我可以看到关键字“心血管”。 这是XMl文件的一部分:

<?xml version="1.0" encoding="UTF-8"?>
<drugbank xmlns="http://www.drugbank.ca" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.drugbank.ca http://www.dru.xsd" version="5.1" exported-on="2018-07-03">
<drug type="biotech" created="2005-06-13" updated="2018-07-02">
  <drugbank-id primary="true">DB00001</drugbank-id>
  <drugbank-id>BTD00024</drugbank-id>
  <drugbank-id>BIOD00024</drugbank-id>
  <name>Lepirudin</name>
  <description>Lepirudi.</description>
  <cas-number>138068-37-8</cas-number>
  <unii>Y43GF64R34</unii>
  <state>liquid</state>
  <groups>
    <group>approved</group>
  </groups>
  <general-references>
    <articles>
      <article>
        <pubmed-id>16244762</pubmed-id>
        <citation>Smythe</citation>
      </article>
      <article>
        <pubmed-id>16690967</pubmed-id>
        <citation>Tardy B,</citation>
      </article>
      <article>
        <pubmed-id>16241940</pubmed-id>
        <citation>ggg</citation>
      </article>
    </articles>
    <textbooks/>
    <links>
      <link>
        <title>Google books</title>
        <url>http://books.google.com/books?id=iadLoXoQkWEC&amp;pg=PA440</url>
      </link>
    </links>
  </general-references>
  <synthesis-reference/>
  <indication>For cardiovascular</indication>
  <pharmacodynamics>Lepirudin </pharmacodynamics>
  <mechanism-of-action>Lepirudin</mechanism-of-action>
  <toxicity>In ca</toxicity>
  <metabolism>Lepirudi</metabolism>
  <absorption>Bioavailability is 100% following injection.</absorption>
  <half-life>Approximately 1.3 hours</half-life>
  <protein-binding/>
  <route-of-elimination>Lepir</route-of-elimination>
  <volume-of-distribution>*</volume-of-distribution>
  <clearance>* 164 ml/min [Healthy 18-60 yrs]&#13;
* 139 ml/min [Healthy 65-80 yrs]&#13;
* 61 ml/min [renal impaired]&#13;
* 114 ml/min [HIT (Heparin-induced thrombocytopenia)]</clearance>
  <classification>......

此外,我不得不提到XML文件中有多个<drug>节点,并且每个<indication>节点中都有多个<drug>节点

1 个答案:

答案 0 :(得分:0)

可能有用的方法是使用一些支持xpath包含查询的库(似乎可以使用lxml来完成,但是etree我认为只支持有限的xpath查询?),然后将树导航到要返回的任何节点/访问其他值。这是第45行中某人在github上发布的内容的示例:https://gist.github.com/IanHopkinson/ad45831a2fb73f537a79