我正在寻找一个XPATH来提取'设置'作为单独的序列。它必须由python lxml
(它是libxml2
的包装器)解释。
例如,给出以下内容:
<root>
<sub1>
<sub2>
<Container>
<item>1 - My laptop has exploded again</item>
<item>2 - This is an issue which needs to be fixed.</item>
</Container>
</sub2>
<sub2>
<Container>
<item>3 - It's still not working</item>
<item>4 - do we have a working IT department or what?</item>
</Container>
</sub2>
<sub2>
<Container>
<item>5 - Never mind - I got my 8 year old niece to fix it</item>
</Container>
</sub2>
</sub1>
</root>
我希望能够隔离&#39;每个组或序列,例如序列1是:
1 - My laptop has exploded again
2 - This is an issue which needs to be fixed.
第二个序列:
3 - It's still not working
4 - do we have a working IT department or what?
第三顺序:
5 - Never mind - I got my 8 year old niece to fix it
其中&#39;序列&#39;将被翻译为伪代码/ python:
seq1 = ['1 - My laptop has exploded again', '2 - This is an issue which needs to be fixed.']
seq2 = ['3 - It's still not working', '4 - do we have a working IT department or what?']
seq 3 = ['5 - Never mind - I got my 8 year old niece to fix it']
从一些初步研究看起来似乎是sequences can't be nested,但我想知道是否有一些黑魔法可以与these operators相提并论。
答案 0 :(得分:1)
评估此XPath表达式:
count(/*/*/*)
这会找到<sub2>
元素的数量(等效且更易读,但更长,是:
count(/*/sub1/sub2))
对于1到$n
中的每个count(/*/*/*)
,请评估以下XPath表达式:
/*/*/*[$n]/*/item/text()
同样,这相当于更长,更易读:
/*/sub1/sub2[$n]/Container/item/text()
在评估上述表达式之前,将$n
替换为$n
的实际值(例如,对字符串使用format()
方法。
对于提供的XML文档$n
为3,因此评估的实际XPath表达式为:
/*/*/*[1]/*/item/text()
,
/*/*/*[2]/*/item/text()
,
/*/*/*[3]/*/item/text()
他们各自产生以下结果:
集合(依赖于语言 - 数组,序列,集合,IEnumerable<string>
,...等):
"1 - My laptop has exploded again", "2 - This is an issue which needs to be fixed."
,
"3 - It's still not working", "4 - do we have a working IT department or what?"
,
"5 - Never mind - I got my 8 year old niece to fix it"
答案 1 :(得分:0)
from lxml import etree
doc=etree.parse("data.xml");
v = doc.findall('sub1/sub2/Container')
finalResult = list()
for vv in v:
sequence = list()
for item in vv.findall('item'):
sequence.append(item.text)
finalResult.append(sequence)
print finalResult
这就是结果:
[['1 - My laptop has exploded again', '2 - This is an issue which needs to be fixed.'], ["3 - It's still not working", '4 - do we have a working IT department or what?'], ['5 - Never mind - I got my 8 year old niece to fix it']]
我假设数据位于与包含上述代码的脚本相同的目录中名为“data.xml”的文件中。