Question

我是这个东西的新手。由于我的原始xml大约是8GB，因此很难在原始xml中手动探索所有父母，祖父母，祖父母等等感兴趣的孩子。我试图浏览所有节点，直到找到感兴趣的孩子。所以我想从这里https://docs.python.org/2/library/xml.etree.elementtree.html创建xml的“骨架”结构，直到感兴趣的country_data.xml子项。对不起代码：

def LookThrougStructure(parent, xpath_str, stop_flag):
    out_str.write('Parent tag: %s\n' % (parent.tag))
    for child in parent:
        if child.tag == my_tag:
            out_str.write('Child tag: %s\n' % (child.tag))
            #my_node_is_found_flag = 1
            break
        LookThrougStructure(child, child.tag, 0)
    return  
import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()
my_tag = 'neighbor'
out_str = open('xml_structure.txt', 'w')
LookThrougStructure(root, root.tag, my_tag)
out_str.close()

它工作错误并且所有节点标记都是：

父标记：data父标记：country父标记：rank父标记：year 父标记：gdppc子标记：邻居父标记：country父标记： rank父标记：year父标记：gdppc子标记：邻居父标记 tag：country父标记：rank父标记：year父标记：gdppc Child tag：邻居

但我想要那样的东西（我感兴趣的孩子是“邻居”）：

数据
- 国
  - 邻

或者说：/ data / country / neighbor。有什么问题？

Answer 1

如果我理解正确你想要的东西：

look_through_structure(parent, my_tag):
    for node in parent.iter("*"):
        out_str.write('Parent tag: %s\n' % node.tag)
        for nxt in node:
            if nxt.tag == my_tag:
                out_str.write('child tag: %s\n' % my_tag)
                return
            out_str.write('Parent tag: %s\n' % nxt.tag)
            if any(ch.tag == my_tag for ch in nxt.getchildren()):
                out_str.write('child tag: %s\n' % my_tag)
                return

如果我们稍微改变一下这个函数并产生标签：

def look_through_structure(parent, my_tag):
    for node in parent.iter("*"):
        yield node.tag
        for nxt in node:
            if nxt.tag == my_tag:
                yield nxt.tag
                return
            yield nxt.tag
            if any(ch.tag == my_tag for ch in nxt.getchildren()):
                yield my_tag
                return

并在文件上运行：

In [24]: root = tree.getroot()

In [25]: my_tag = 'neighbor'

In [26]: list(look_through_structure(root, my_tag))
Out[26]: ['data', 'country', 'neighbor']

此外，如果您只想要完整路径，lxml的getpath会为您做到这一点：

import lxml.etree as ET

tree = ET.parse('country.xml')

my_tag = 'neighbor'

print(tree.getpath(tree.find(".//neighbor")))

输出：

/data/country[1]/neighbor[1]

Answer 2

@Padraic。非常感谢！你的代码主要是我想要的。但是，如果我插入其他节点（例如属性），它是国家节点的子节点和父节点的邻居节点，则会产生意外结果：

<data>
<country name="Liechtenstein">
<attributes>
    <rank>1</rank>
    <year>2008</year>
    <gdppc>141100</gdppc>
    <neighbor name="Austria" direction="E"/>
    <neighbor name="Switzerland" direction="W"/>
    </attributes>
</country>
<country name="Singapore">
<attributes>
    <rank>4</rank>
    <year>2011</year>
    <gdppc>59900</gdppc>
    <neighbor name="Malaysia" direction="N"/>
    </attributes>
</country>
<country name="Panama">
<attributes>
    <rank>68</rank>
    <year>2011</year>
    <gdppc>13600</gdppc>
    <neighbor name="Costa Rica" direction="W"/>
    <neighbor name="Colombia" direction="E"/>
    </attributes>
</country>

无论如何，你的帮助非常丰富。我拿你的代码创建这个：

import lxml.etree as et
root = et.parse('country_data.xml')

out_f = open('getpath.txt', 'w')

my_str1 = 'country[1]'
my_str2 = 'neighbor[1]'

for e in root.iter():
    s = root.getelementpath(e)
    if my_str1 not in s:
        continue
    if my_str2 not in s:
        continue
    out_f.write('%s\n' %(s))
    break
out_f.close()

这个想法很简单：如果elementpath有字符串＆＃39; country＆＃39;和邻居＆＃39;它被写入输出文件。对于原始的xml示例，它给出：country [1] / neighbor [1]。对于带有附加父级的xml，它给出：country [1] / attributes / neighbor [1]。

如何使用Python生成xml结构到某些xml节点？

2 个答案: