Question

我正在寻找一个XPATH来提取＆＃39;设置＆＃39;作为单独的序列。它必须由python lxml（它是libxml2的包装器）解释。

例如，给出以下内容：

<root>
    <sub1>
        <sub2>
            <Container>
                <item>1 - My laptop has exploded again</item>
                <item>2 - This is an issue which needs to be fixed.</item>
            </Container>
        </sub2>
        <sub2>
            <Container>
                <item>3 - It's still not working</item>
                <item>4 - do we have a working IT department or what?</item>
            </Container>
        </sub2>
        <sub2>
            <Container>
                <item>5 - Never mind - I got my 8 year old niece to fix it</item>
            </Container>
        </sub2>
    </sub1>
</root>

我希望能够隔离＆＃39;每个组或序列，例如序列1是：

1 - My laptop has exploded again
2 - This is an issue which needs to be fixed.

第二个序列：

3 - It's still not working
4 - do we have a working IT department or what?

第三顺序：

5 - Never mind - I got my 8 year old niece to fix it

其中＆＃39;序列＆＃39;将被翻译为伪代码/ python：

seq1 = ['1 - My laptop has exploded again', '2 - This is an issue which needs to be fixed.']
seq2 = ['3 - It's still not working', '4 - do we have a working IT department or what?']
seq 3 = ['5 - Never mind - I got my 8 year old niece to fix it']

从一些初步研究看起来似乎是sequences can't be nested，但我想知道是否有一些黑魔法可以与these operators相提并论。

Answer 1

评估此XPath表达式：

count(/*/*/*)

这会找到<sub2>元素的数量（等效且更易读，但更长，是：

count(/*/sub1/sub2))

对于1到$n中的每个count(/*/*/*)，请评估以下XPath表达式：

/*/*/*[$n]/*/item/text()

同样，这相当于更长，更易读：

/*/sub1/sub2[$n]/Container/item/text()

在评估上述表达式之前，将$n替换为$n的实际值（例如，对字符串使用format()方法。

对于提供的XML文档$n为3，因此评估的实际XPath表达式为：

/*/*/*[1]/*/item/text()

，

/*/*/*[2]/*/item/text()

，

/*/*/*[3]/*/item/text()

他们各自产生以下结果：

集合（依赖于语言 - 数组，序列，集合，IEnumerable<string>，...等）：

"1 - My laptop has exploded again", "2 - This is an issue which needs to be fixed."

，

"3 - It's still not working", "4 - do we have a working IT department or what?"

，

"5 - Never mind - I got my 8 year old niece to fix it"

Answer 2

from lxml import etree

doc=etree.parse("data.xml");
v = doc.findall('sub1/sub2/Container')
finalResult = list()
for vv in v:
    sequence = list()
    for item in vv.findall('item'):
        sequence.append(item.text)
    finalResult.append(sequence)
print finalResult

这就是结果：

[['1 - My laptop has exploded again', '2 - This is an issue which needs to be fixed.'], ["3 - It's still not working", '4 - do we have a working IT department or what?'], ['5 - Never mind - I got my 8 year old niece to fix it']]

注意

我假设数据位于与包含上述代码的脚本相同的目录中名为“data.xml”的文件中。

使用XPath提取序列子集

2 个答案:

注意