我正在尝试提取/RDF/Description/id/text()
字符串,该字符串应为someid
以下。使用python的lxml进行提取的适当xpath是什么?
<?xml version="1.0" encoding="utf-8"?>
<!-- This Source Code Form is subject to the terms of the Mozilla Public
- License, v. 2.0. If a copy of the MPL was not distributed with this
- file, You can obtain one at http://mozilla.org/MPL/2.0/. -->
<RDF xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:em="http://www.mozilla.org/2004/em-rdf#">
<Description about="urn:mozilla:install-manifest">
<em:id>my-extension@mozilla</em:id>
<em:version>initial</em:version>
<em:type>2</em:type>
<em:bootstrap>true</em:bootstrap>
<em:unpack>false</em:unpack>
<!-- Firefox -->
<em:targetApplication>
<Description>
<em:id>{someid}</em:id>
<em:minVersion>7.0</em:minVersion>
<em:maxVersion>27.0</em:maxVersion>
</Description>
</em:targetApplication>
<!-- Front End MetaData -->
<!-- must provide default non-localized because It's used as a default on AMO. It's used as a default by the add-on manager, with the possibility of other locales overriding it. Failure to provide a non-localized name will lead to failed upload on AMO. -->
<em:name>l10n</em:name>
<em:description>ff-addon-demo: Shows how to localize restartless add-ons.</em:description>
<em:creator>Noitidart</em:creator>
<!-- start localizing -->
<em:localized>
<Description>
<em:locale>en-GB</em:locale>
<em:name>l10n :: en-GB</em:name>
<em:description>en-GB :: ff-addon-demo: Shows how to localize restartless add-ons. </em:description>
<em:creator>en-GB :: Noitidart</em:creator>
</Description>
</em:localized>
<em:localized>
<Description>
<em:locale>en-US</em:locale>
<em:name>l10n :: en-US</em:name>
<em:description>en-US :: ff-addon-demo: Shows how to localize restartless add-ons. </em:description>
<em:creator>en-US :: Noitidart</em:creator>
</Description>
</em:localized>
</Description>
</RDF>
我实际上已经尝试了所有这些:"*/*[4]" , "*/*[4]" , "*/*" , "@my:*" , "em:*" , "my:*" , "@*" , "//id" , "//em:id" , "//em" , "//*[text()='USA']" , "{http://www.mozilla.org/2004/em-rdf#}:localized" , "*/*" , "//tag:RDF" , "//*RDF" , "/RDF/Description/em:targetApplication" , "*/localized" , "*/*localized" , "*/*" , "*/*" , "*/*" , "*/*" , "*/*" , "*/*" , "*/http://www.mozilla.org/2004/em-rdf#" , "*/RDF" , "*/*" , "/RDF" , "//RDF" , "/RDF", ".//Description" , "//?xml" , "//about" , "//em" , "//Description" , "/RDF" , "*/*" , "*/Description" , "*/Descriptoin" , "*" , "./?xml" , "?xml" , "//?xml" , "//http://www.w3.org/1999/02/22-rdf-syntax-ns#}RDF" , "http://www.w3.org/1999/02/22-rdf-syntax-ns#}RDF" , "//version" , "//xml" , "//" , "//RDF" , "./version" , "version" , "xml" , "/RDF/Description/*" , "/RDF/Description"
,浪费了很多时间。
编辑:在下面的解决方案之后,我找到了这个常见问题的好参考文档 https://msdn.microsoft.com/en-us/library/ms950779.aspx
答案 0 :(得分:0)
这是一种可能的方式;看看XPath如何与XML结构相对应,以及如何使用XPath在命名空间中引用XML元素:
from lxml import etree
xml = """your xml as posted in question here"""
root = etree.fromstring(xml)
nsmap = {'d': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
'em': 'http://www.mozilla.org/2004/em-rdf#'}
result = root.xpath("/d:RDF/d:Description/em:targetApplication/d:Description/em:id/text()",
namespaces=nsmap)
print(result)
输出
['{someid}']