我有这条xml字符串。
<?xml version="1.0" encoding="UTF-8"?>
<xmi:XMI xmi:version="2.0" xmlns:xmi="http://www.omg.org/XMI" xmlns:libraries="http://www.ibm.com/websphere/appserver/schemas/5.0/libraries.xmi">
<libraries:Library xmi:id="Library_1382473016602" name="sfi_lib" isolatedClassLoader="false">
<classPath>${HOME_SFI_LIB}/sfi_com_sqw_java.jar</classPath>
</libraries:Library>
<libraries:Library xmi:id="Library_1528914932212" name="sfi_lib_server" isolatedClassLoader="false">
<classPath>${HOME_SFI_LIB}/jasper/jasperreports-5.6.0.jar</classPath>
<classPath>${HOME_SFI_LIB}/jasper/jasperreports-fonts-3.7.4.jar</classPath>
<classPath>${HOME_SFI_LIB}/commons/commons-beanutils-1.8.2.jar</classPath>
<classPath>${HOME_SFI_LIB}/commons/commons-collections-3.2.1.jar</classPath>
<classPath>${HOME_SFI_LIB}/commons/commons-digester-2.1.jar</classPath>
<classPath>${HOME_SFI_LIB}/commons/commons-discovery-0.2.jar</classPath>
<classPath>${HOME_SFI_LIB}/commons/commons-logging-1.1.1.jar</classPath>
<classPath>${HOME_SFI_LIB}/commons/xml-apis.jar</classPath>
<classPath>${HOME_SFI_LIB}/commons/iText-2.1.7.jar</classPath>
<classPath>${HOME_SFI_LIB}/jasper/barbecue-1.5-beta1.jar</classPath>
<classPath>${HOME_SFI_LIB}/bouncycastle/bcprov-jdk15-1.45.jar</classPath>
<classPath>${HOME_SFI_LIB}/bouncycastle/bcmail-jdk15-1.45.jar</classPath>
<classPath>${HOME_SFI_LIB}/bouncycastle/bctsp-jdk14-1.45.jar</classPath>
<classPath>${HOME_SFI}/sfi_arquivos/templates</classPath>
<classPath>${HOME_SFI_LIB}/sfi_framework_java.jar</classPath>
<classPath>${HOME_SFI_LIB}/sfi_adm_ama_java.jar</classPath>
<classPath>${HOME_SFI_LIB}/sfi_adm_gce_java.jar</classPath>
<classPath>${HOME_SFI_LIB}/sfi_adm_gdl_java.jar</classPath>
<classPath>${HOME_SFI_LIB}/sfi_adm_prt_java.jar</classPath>
<classPath>${HOME_SFI_LIB}/sfi_com_acg_java.jar</classPath>
<classPath>${HOME_SFI_LIB}/sfi_com_sca_java.jar</classPath>
<classPath>${HOME_SFI_LIB}/sfi_com_tge_java.jar</classPath>
<classPath>${HOME_SFI_LIB}/sfi_com_utl_java.jar</classPath>
<classPath>${HOME_SFI_LIB}/sfi_ext_sge_java.jar</classPath>
</libraries:Library>
</xmi:XMI>
我想做的是获取以${HOME_SFI_LIB}/sfi_
开头的元素的值。
我正在使用re
python的模块来完成工作。我当前的代码仅按标签classPath
进行过滤,但还不够。我当前使用的正则表达式:
re.findall('<classPath>(.*?)</classPath>', xml)
有人可以帮助我改善RE以便过滤以${HOME_SFI_LIB}/sfi_
开头的元素,例如节点<classPath>${HOME_SFI_LIB}/sfi_adm_gce_java.jar</classPath>
吗?
答案 0 :(得分:1)
正如this post所指出的那样,最好使用诸如lxml
之类的xml解析器来浏览诸如xml,html和xhtml之类的语言:
from lxml import etree
with open('your_file.xml') as fh:
tree = etree.parse(fh)
# Now you have an elementTree instance that you can search tags with
# we can use a selector here to return a list
class_paths = tree.xpath('//classPath')
for c in class_paths:
if '${HOME_SFI_LIB}/sfi_' in c.text:
# rest of your code
虽然您可能会争辩说对于一个简单的xml文档,正则表达式方法可以起作用,但是通常,树使此过程更容易扩展到更大,更复杂的文档
如果您无法pip install lxml
,则会内置xml
软件包并以一种非常相似的方式运行
from xml.etree import ElementTree as ET
with open('your_file.xml') as fh:
tree = ET.parse(fh)
for element in tree.iterfind('.//classPath'):
if '${HOME_SFI_LIB}/sfi_' in element.text:
# rest of your code