lxml xpath RDF无法正常工作

时间:2015-07-24 05:41:01

标签: python xml xpath lxml rdf

我正在尝试提取/RDF/Description/id/text()字符串,该字符串应为someid以下。使用python的lxml进行提取的适当xpath是什么?

<?xml version="1.0" encoding="utf-8"?>
    <!-- This Source Code Form is subject to the terms of the Mozilla Public
       - License, v. 2.0. If a copy of the MPL was not distributed with this
       - file, You can obtain one at http://mozilla.org/MPL/2.0/. -->
    <RDF xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:em="http://www.mozilla.org/2004/em-rdf#">
      <Description about="urn:mozilla:install-manifest">
        <em:id>my-extension@mozilla</em:id>
        <em:version>initial</em:version>
        <em:type>2</em:type>
        <em:bootstrap>true</em:bootstrap>
        <em:unpack>false</em:unpack>

        <!-- Firefox -->
        <em:targetApplication>
            <Description>
                <em:id>{someid}</em:id>
                <em:minVersion>7.0</em:minVersion>
                <em:maxVersion>27.0</em:maxVersion>
            </Description>
        </em:targetApplication>

        <!-- Front End MetaData -->
        <!-- must provide default non-localized because It's used as a default on AMO. It's used as a default by the add-on manager, with the possibility of other locales overriding it. Failure to provide a non-localized name will lead to failed upload on AMO. -->
        <em:name>l10n</em:name>
        <em:description>ff-addon-demo: Shows how to localize restartless add-ons.</em:description>
        <em:creator>Noitidart</em:creator>
        <!-- start localizing -->
        <em:localized>
            <Description>
                <em:locale>en-GB</em:locale>
                <em:name>l10n :: en-GB</em:name>
                <em:description>en-GB :: ff-addon-demo: Shows how to localize restartless add-ons. </em:description>
                <em:creator>en-GB :: Noitidart</em:creator>
            </Description>
        </em:localized>
        <em:localized>
            <Description>
                <em:locale>en-US</em:locale>
                <em:name>l10n :: en-US</em:name>
                <em:description>en-US :: ff-addon-demo: Shows how to localize restartless add-ons. </em:description>
                <em:creator>en-US :: Noitidart</em:creator>
            </Description>
        </em:localized>
      </Description>
    </RDF>

我实际上已经尝试了所有这些:"*/*[4]" , "*/*[4]" , "*/*" , "@my:*" , "em:*" , "my:*" , "@*" , "//id" , "//em:id" , "//em" , "//*[text()='USA']" , "{http://www.mozilla.org/2004/em-rdf#}:localized" , "*/*" , "//tag:RDF" , "//*RDF" , "/RDF/Description/em:targetApplication" , "*/localized" , "*/*localized" , "*/*" , "*/*" , "*/*" , "*/*" , "*/*" , "*/*" , "*/http://www.mozilla.org/2004/em-rdf#" , "*/RDF" , "*/*" , "/RDF" , "//RDF" , "/RDF", ".//Description" , "//?xml" , "//about" , "//em" , "//Description" , "/RDF" , "*/*" , "*/Description" , "*/Descriptoin" , "*" , "./?xml" , "?xml" , "//?xml" , "//http://www.w3.org/1999/02/22-rdf-syntax-ns#}RDF" , "http://www.w3.org/1999/02/22-rdf-syntax-ns#}RDF" , "//version" , "//xml" , "//" , "//RDF" , "./version" , "version" , "xml" , "/RDF/Description/*" , "/RDF/Description",浪费了很多时间。

编辑:在下面的解决方案之后,我找到了这个常见问题的好参考文档 https://msdn.microsoft.com/en-us/library/ms950779.aspx

1 个答案:

答案 0 :(得分:0)

这是一种可能的方式;看看XPath如何与XML结构相对应,以及如何使用XPath在命名空间中引用XML元素:

from lxml import etree

xml = """your xml as posted in question here"""
root = etree.fromstring(xml)
nsmap = {'d': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
         'em': 'http://www.mozilla.org/2004/em-rdf#'}
result = root.xpath("/d:RDF/d:Description/em:targetApplication/d:Description/em:id/text()",
                    namespaces=nsmap)
print(result)

输出

['{someid}']