我正在寻找一个XPath表达式来获取没有引用部分的文章内容。我希望文章部分中的所有内容都可以使用,直到出现<p>
标记,并且其中带有“参考”为止。
//root/main/article/following-sibling::p[.="References"]
<root>
<main>
<article>
<p>
The stunning increase in homelessness announced in Los Angeles
this week — up 16% over last year citywide — was an almost an
incomprehensible conundrum given the nation's booming economy
and the hundreds of millions of dollars that city, county and
state officials have directed toward the problem.
</p>
<p>
"We cannot let a set of difficult numbers discourage us
or weaken our resolve" Garcetti said.
</p>
<p>
References: Maeve Reston, CNN
</p>
</article>
</main>
</root>
我要寻找的结果如下。
<p>
The stunning increase in homelessness announced in Los Angeles
this week — up 16% over last year citywide — was an almost an
incomprehensible conundrum given the nation's booming economy
and the hundreds of millions of dollars that city, county and
state officials have directed toward the problem.
</p>
<p>
"We cannot let a set of difficult numbers discourage us
or weaken our resolve" Garcetti said.
</p>
答案 0 :(得分:1)
此XPath,
/root/main/article/p[starts-with(normalize-space(),'References')]
/preceding-sibling::p
将选择带有“参考”的段落之前的段落。
如果只需要这些/text()
元素的文本节点子代,则可以附加p
。