如果文本包含在代码中的最后一个元素&lt; p &gt;&lt; em &gt; ...&lt; <,则目标是排除文本EM> / EM &GT;&LT; / p &GT;
...
<p>"We are a better country because of these commitments," he said. "I'll go further – we would not be a great country without [them]."</p>
<p>Liberals were mostly delighted by what the <em>Washington Post</em> called "the most ambitious defence [Obama] may ever have attempted of American liberalism and of what it means to be a Democrat".</p>
<p>This was the Obama many of them hoped for when they voted him into office on that wave of enthusiasm back in 2008.</p>
<p><em>Felicity Spector is a deputy programme editor for <a href="http://www.channel4.com/news/" onclick="window.open(this.href);return false;" onkeypress="window.open(this.href);return false;">Channel 4 News</a></em>.</p>
输出应为:
"We are a better country because of these commitments," he said. "I'll go further – we would not be a great country without [them]."
Liberals were mostly delighted by what the Washington Post called "the most ambitious defence [Obama] may ever have attempted of American liberalism and of what it means to be a Democrat".
This was the Obama many of them hoped for when they voted him into office on that wave of enthusiasm back in 2008.
因此,如果文本位于最后一个&lt; p &gt;中,则需要获取除文本之外的所有内容。标签,&lt; p &gt;之间没有其他文字和&lt; em &gt;标签以及&lt; / em &gt;之间没有其他文字和&lt; / p &gt;如上例所示。
我正在使用
//p[normalize-space()]
但它会返回eveerything,包括最后一个标记中的文字&lt; p &gt;&lt; em &gt; ...&lt; / em &GT;:
"We are a better country because of these commitments," he said. "I'll go further – we would not be a great country without [them]."
Liberals were mostly delighted by what the Washington Post called "the most ambitious defence [Obama] may ever have attempted of American liberalism and of what it means to be a Democrat".
This was the Obama many of them hoped for when they voted him into office on that wave of enthusiasm back in 2008.
Felicity Spector is a deputy programme editor for Channel 4 News
应该排除最后的遗产。
感谢任何提示。
UPD
示例1。如果下一个文本位于最后一个&lt; p &gt;中,则应返回下一个文本。 (因为并非所有文本都在&lt; em &gt;:
内<p>I was once on a travelling sanitation carnival in Uttar Pradesh when someone rushed up to me. “Rose! Rose! There’s a real ‘no loo no I do’!” If that’s the story of <em>Ek Prem Katha</em>, then I’m all in favour. But because bringing a toilet into the world, when there are still 2.4 billion people without one, is by any reckoning a very happy ending.</p>
示例2。如果下一个文本位于最后一个&lt; p &gt;中,则应不。 (因为里面的所有文字&lt; em &gt;):
<p><em>Sophie Elmhirst is an assistant editor of the NS</em></p>
答案 0 :(得分:1)
目前还不清楚规则到底是什么,但这里有一个建议,你可以评论:
//p[normalize-space() and not(position() = last() and em)]
转换为
//p find all `p` elements anywhere in the document
[normalize-space() but only if the contain at least 1 character that is not a white-space
and not(position() = last() and only if the `p` element is not the last `p` child of its parent
and em)] and only if the `p` element has no child named `em`
并返回结果(单个结果以-------
分隔):
<p>"We are a better country because of these commitments," he said. "I'll go further – we would not be a great country without [them]."</p>
-----------------------
<p>Liberals were mostly delighted by what the <em>Washington Post</em> called "the most ambitious defence [Obama] may ever have attempted of American liberalism and of what it means to be a Democrat".</p>
-----------------------
<p>This was the Obama many of them hoped for when they voted him into office on that wave of enthusiasm back in 2008.</p>
警告:如果文档的结构实际上更复杂,那么在某些地方可能会出错,例如,当p
元素出现在层次结构中的任何位置时。
也许不需要normalize-space但我总是使用它。
如果您真的想要排除只有空格的元素,请仅使用它。
我需要排除最后一个p中的文本,只要整个文本包含在em
中
嗯,在您自己的最后一个p
元素的示例中,其中em
并非如此:最后.
实际上在em
之外,它是一个p
的文本节点。
编辑对评论作出反应:
我只是发现,如果在最后一个p中有一个文本,那么xpath不会返回文本。我已经更新了一个例子
然后使用下面的路径表达式:
//p[normalize-space() and not(position() = last() and em and not(text()))]
抱歉,直到现在我可能误解了你。 not(text())
考虑的是p
之外是否存在em
本身的子文字。