Question

我使用python和lxml来处理xml。我查询/过滤后找到我想要的节点，但我遇到了一些问题。如何通过xpath获取其属性的值？这是我的输入示例。

>print(etree.tostring(node, pretty_print=True ))
<rdf:li xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"  rdf:resource="urn:miriam:obo.chebi:CHEBI%3A37671"/>

我想要的值是在resource = ....目前我只是使用lxml来获取值。我想知道是否可以在纯xpath中进行？感谢

编辑：忘了说，这不是根节点所以我不能在这里使用//。我在xml文件中喜欢2000-3000个其他人。我的第一次尝试是使用“。@ attrib”和“self :: * @”，但这些似乎不起作用。

EDIT2：我会尽力解释（好吧，这是我第一次使用xpath处理xml问题。而且英语不是我最喜欢的字段之一......）。这是我的输入代码段http://pastebin.com/kZmVdbQQ（使用版本4来自http://www.comp-sys-bio.org/yeastnet/的完整代码段）。

在我的代码中，我尝试使用资源链接chebi（<rdf:li rdf:resource="urn:miriam:obo.chebi:...."/>)）获取speciesTypes节点。然后我尝试从rdf：li中的rdf：resource属性获取值。事情是，我很确定它如果我从诸如speciesTypes之类的父节点开始，那么在子节点中很容易得到属性，但是我想知道如果从rdf：li开始怎么办。从我的理解，xpath中的“//”将从任何地方寻找节点而不是仅在当前节点中。

下面是我的代码

import lxml.etree as etree

tree = etree.parse("yeast_4.02.xml")
root = tree.getroot()
ns = {"sbml": "http://www.sbml.org/sbml/level2/version4", 
      "rdf":"http://www.w3.org/1999/02/22-rdf-syntax-ns#",
      "body":"http://www.w3.org/1999/xhtml",
      "re": "http://exslt.org/regular-expressions"
      }
#good enough for now
maybemeta = root.xpath("//sbml:speciesType[descendant::rdf:li[starts-with(@rdf:resource, 'urn:miriam:obo.chebi') and not(starts-with(@rdf:resource, 'urn:miriam:uniprot'))]]", namespaces = ns)

def extract_name_and_chebi(node):
    name = node.attrib['name']
    chebies = node.xpath("./sbml:annotation//rdf:li[starts-with(@rdf:resource, 'urn:miriam:obo.chebi') and not(starts-with(@rdf:resource, 'urn:miriam:uniprot'))]", namespaces=ns) #get all rdf:li node with chebi resource
    assert len(chebies) == 1
    #my current solution to get rdf:resource value from rdf:li node
    rdfNS = "{" + ns.get('rdf') + "}"
    chebi = chebies[0].attrib[rdfNS + 'resource'] 
    #do protein later
    return (name, chebi)

    metaWithChebi = map(extract_name_and_chebi, maybemeta)
fo = open("metabolites.txt", "w")

for name, chebi in metaWithChebi:
    fo.write("{0}\t{1}\n".format(name, chebi))

Answer 1

在XPath查询中使用@前缀属性名称：

>>> from lxml import etree
>>> xml = """\
... <?xml version="1.0" encoding="utf8"?>
... <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
...     <rdf:li rdf:resource="urn:miriam:obo.chebi:CHEBI%3A37671"/>
... </rdf:RDF>
... """
>>> tree = etree.fromstring(xml)
>>> ns = {'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'}
>>> tree.xpath('//rdf:li/@rdf:resource', namespaces=ns)
['urn:miriam:obo.chebi:CHEBI%3A37671']

修改

以下是问题中脚本的修订版本：

import lxml.etree as etree ns = { 'sbml': 'http://www.sbml.org/sbml/level2/version4', 'rdf':'http://www.w3.org/1999/02/22-rdf-syntax-ns#', 'body':'http://www.w3.org/1999/xhtml', 're': 'http://exslt.org/regular-expressions', } def extract_name_and_chebi(node): chebies = node.xpath(""" .//rdf:li[ starts-with(@rdf:resource, 'urn:miriam:obo.chebi') ]/@rdf:resource """, namespaces=ns) return node.attrib['name'], chebies[0] with open('yeast_4.02.xml') as xml: tree = etree.parse(xml) maybemeta = tree.xpath(""" //sbml:speciesType[descendant::rdf:li[ starts-with(@rdf:resource, 'urn:miriam:obo.chebi')]] """, namespaces = ns) with open('metabolites.txt', 'w') as output: for node in maybemeta: output.write('%s\t%s\n' % extract_name_and_chebi(node))

Answer 2

要选择当前节点名为rdf:resource的属性，请使用此XPath表达式：

@rdf:resource

为了使其“正常工作”，您必须将前缀"rdf:"的关联注册到相应的命名空间。

如果您不知道如何注册rdf名称空间，仍然可以选择属性 - 使用此XPath表达式：

@*[name()='rdf:resource']

Answer 3

嗯，我明白了。我需要的xpath表达式是“./@rdf:resource”而不是“。@ rdf：resource”。但为什么？我认为“./”表示当前节点的子节点。

Xpath选择当前节点的属性？

3 个答案: