我知道这个问题很普遍,但是下面的示例比问题标题所暗示的复杂得多。
假设我有以下“ test.xml”文件:
<?xml version="1.0" encoding="UTF-8"?>
<test:xml xmlns:test="http://com/whatever/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<parent xsi:type="parentType">
<child xsi:type="childtype">
<grandchild>
<greatgrandchildone>greatgrandchildone</greatgrandchildone>
<greatgrandchildtwo>greatgrandchildtwo</greatgrandchildtwo>
</grandchild><!--random comment -->
</child>
<child xsi:type="childtype">
<greatgrandchildthree>greatgrandchildthree</greatgrandchildthree>
<greatgrandchildfour>greatgrandchildfour</greatgrandchildfour><!--another random comment -->
</child>
<child xsi:type="childtype">
<greatgrandchildthree>greatgrandchildthree</greatgrandchildthree>
<greatgrandchildfour>greatgrandchildfour</greatgrandchildfour><!--third random comment -->
</child>
</parent>
</test:xml>
在下面的程序中,我正在做两件事:
这是我的代码:
from lxml import etree
import re
xmlDoc = etree.parse("test.xml")
root = xmlDoc.getroot()
nsmap = {
'xsi': 'http://www.w3.org/2001/XMLSchema-instance'
}
nodesWithType = []
def check_type_in_path(nodesWithType, path, root):
typesInPath = []
elementType = ""
for node in nodesWithType:
print("checking node: ", node, " and path: ", path)
if re.search(r"\b{}\b".format(
node), path, re.IGNORECASE) is not None:
element = root.find('.//{0}'.format(node))
elementType = element.attrib.get(f"{{{nsmap['xsi']}}}type")
if elementType is not None:
print("found an element for this path. adding to list")
typesInPath.append(elementType)
else:
print("element: ", node, " not found in path: ", path)
print("path ", path ," has types: ", elementType)
print("-------------------")
return typesInPath
def get_all_node_types(xmlDoc):
nodesWithType = []
root = xmlDoc.getroot()
for node in xmlDoc.iter():
path = "/".join(xmlDoc.getpath(node).strip("/").split('/')[1:])
if "COMMENT" not in path.upper():
element = root.find('.//{0}'.format(path))
elementType = element.attrib.get(f"{{{nsmap['xsi']}}}type")
if elementType is not None:
nodesWithType.append(path)
return nodesWithType
nodesWithType = get_all_node_types(xmlDoc)
print("nodesWithType: ", nodesWithType)
for node in xmlDoc.xpath('//*'):
path = "/".join(xmlDoc.getpath(node).strip("/").split('/')[1:])
typesInPath = check_type_in_path(nodesWithType, path, root)
代码应返回所有包含在特定路径中的类型。例如,考虑路径parent/child[3]/greatgrandchildfour
。此路径是包含属性“类型”的两个节点的子节点(直接的或远离的):parent
和parent/child[3]
。因此,我希望该特定节点的nodesWithType
数组同时包含“ parentType”和“ childtype”。
但是,根据下面的打印,此节点的nodesWithType
数组仅包含“ parentType”类型,不包含“ childtype”。该逻辑的主要重点是检查到所讨论节点的路径中是否包括该类型节点的路径(因此检查字符串的精确匹配)。但这显然是行不通的。我不确定是否是因为条件中存在无法验证它的数组注释,或者其他原因。
对于上面的示例,返回的打印件是:
checking node: parent and path: parent/child[3]/greatgrandchildfour
found an element for this path. adding to list
checking node: parent/child[1] and path: parent/child[3]/greatgrandchildfour
element: parent/child[1] not found in path: parent/child[3]/greatgrandchildfour
checking node: parent/child[2] and path: parent/child[3]/greatgrandchildfour
element: parent/child[2] not found in path: parent/child[3]/greatgrandchildfour
checking node: parent/child[3] and path: parent/child[3]/greatgrandchildfour
element: parent/child[3] not found in path: parent/child[3]/greatgrandchildfour
path parent/child[3]/greatgrandchildfour has types: parentType