Question

我是编程新手，因此可能在某处缺乏基础知识。

我有一个xml：

<mother>
<daughter nr='1' state='nice' name='Ada'>
<daughter nr='2' state='naughty' name='Beta'>
<daughter nr='3' state='nice' name='Cecilia'>
<daughter nr='4' state='neither' name='Dora'>
<daughter nr='5' state='naughty' name='Elis'>
</mother>

我需要的是根据他们的数字（漂亮的和她最近的顽皮的女儿）匹配顽皮和漂亮的女儿并打印成对：

Ada Beta  
Cecilia Elis

我的代码：

import libxml2, sys

doc = libxml2.parseFile("file.xml")
tree = doc.xpathNewContext()

nice = tree.xpathEval("//daugter[@state='nice']")

for l in nice:
   print l.prop("name")

nice_nr = []
for n in nice:
    nice_nr.append(n.prop("nr"))

# and the same for the naugty daugters

doc.freeDoc()

所以我能够获得他们的属性的值，但我无法弄清楚如何制作它们。
我能找到的是Xpath的'follow-sibling'轴，但从我发现的所有例子中我都不确定它是否可以在这里使用。语法相当不同，它需要以下所有兄弟姐妹。任何帮助表示赞赏。

Answer 1

使用：

/*/daughter[@state = 'nice'][1] | /*/daughter[@state = 'nice'][1] /following-sibling::daughter[@state='naughty'] [1]

这选择了第一个好女儿和最近的顽皮女儿。

要选择第二对，请使用

/*/daughter[@state = 'nice'][2] | /*/daughter[@state = 'nice'][2] /following-sibling::daughter[@state='naughty'] [1]

......等等。

请注意这些表达式并不保证根本不会选择任何节点 - 可能没有daughter个元素，或者每个不错的daughter元素都可能没有有一个顽皮的兄弟daughter元素。

如果在文档中保证daughter元素的顺序是严格的（'nice'，'naughty），那么可以使用非常简单的获取所有对的XPath表达式：

/ * /女儿[@state ='nice'或@state ='naughty']

这会选择所有daughter个元素作为top元素的子元素，并且具有值为nice, naughty, nice, naughty,的交替状态属性...

如果使用的XPath API在对象数组中获取，那么对于每个偶数k，这对女儿都在第k个和第（k + 1）个成员中阵列。

Answer 2

每个XPath表达式都将返回有序节点列表。只需将列表压缩在一起即可找到相应的对：

xpath = lambda state: tree.xpathEval("//daughter[@state='%s']" % state)
for nodes in zip(xpath('nice'), xpath('naughty')):
    print ' '.join(n.prop('name') for n in nodes)

上面，xpath是一个函数，它计算XPath表达式，该表达式返回与给定state匹配的女儿。然后将两个列表传递给zip，它将返回每个列表中第i个元素的元组。

如果子节点在XML文件中按顺序列出，您可以在将节点传递给nr之前按zip属性对节点进行排序。

Answer 3

我有一个没有xpath的解决方案。还考虑了按编号排列女儿的顺序。该文档仅遍历一次。

from lxml.etree import fromstring

data = """the-xml-above""" 

def fetch_sorted_daughters(data):
    # load data into xml document
    doc = fromstring(data)
    nice = []
    naughty = []

    # extract into doubles - number, name
    for subelement in doc:
        if subelement.tag=='daughter':
            nr = subelement.get('nr')
            name = subelement.get('name')
            if subelement.get('state')=='nice':
                nice.append((nr, name))
            if subelement.get('state')=='naughty':
                naughty.append((nr, name))
    del doc # release document

    # sort doubles
    nice.sort(key=lambda x:x[0])
    naughty.sort(key=lambda x:x[0])

    # get sorted names from doubles 
    nice = tuple([double[1] for double in nice])
    naughty = tuple([double[1] for double in naughty])

    return nice, naughty

nice, naughty = fetch_sorted_daughters(data)
pairs = zip(nice, naughty)

print pairs

使用Python和libxml2根据xml中的标记属性匹配兄弟

3 个答案: