Question

我有一个XML;

<root>
 <entry>
    <accession>A</accession>
    <accession>B</accession>
    <accession>C</accession>
    <feature type="cross-link" description="sumo2">
        <location>
            <position position="15111992"/>
        </location>
    </feature>
    <feature type="temp" description="blah blah sumo">
        <location>
            <position position="12345"/>
        </location>
    </feature>
</entry>
<entry>
  <accession>X</accession>
    <accession>Y</accession>
    <accession>Z</accession>
    <feature type="test" description="testing">
        <location>
            <position position="1"/>
        </location>
    </feature>
    <feature type="cross-link" description="sumo hello">
        <location>
            <position position="11223344"/>
        </location>
    </feature>
 </entry>
</root>

我需要获取其特征类型为“交叉链接”的posiiton属性的值，并且描述包含单词sumo。这是我到目前为止所尝试的，它正确地给了我那些其特征类型为“交叉链接”并且描述包含单词sumo的值。

from xml.dom import minidom
xmldoc = minidom.parse('P38398.xml')
itemlist = xmldoc.getElementsByTagName('feature')

for s in itemlist:
    feattype = s.attributes['type'].value
    description = s.attributes['description'].value
    if "SUMO" in description:
        if "cross-link" in feattype:
            print feattype+","+description

如果我将要素类型设为“交叉链接”并且包含“sumo”一词的说明，我该如何提取位置值？

Answer 1

除了两点之外，你几乎就在那里：

您必须将“相扑”搜索模式更改为小写以匹配上面给出的数据

然后，您需要在循环体中添加以下内容

posList = s.getElementsByTagName('position')
for p in posList:
    print "-- position is {}".format(p.attributes['position'].value)

Answer 2

这是XPath的工作。只需检查attribute matches和substring matches，然后我们将该属性作为字符串返回。

from lxml import etree
root = etree.parse('P38398.xml').getroot()
xpquery = '//feature[@type="cross-link" and contains(@description, "sumo")]//position/@position'
for att in root.xpath(xpquery):
    print(att)

使用minidom在python中解析XML

2 个答案: