Question

如果文本包含在代码中的最后一个元素＆lt; p ＆gt;＆lt; em ＆gt; ...＆lt; <，则目标是排除文本EM> / EM ＆GT;＆LT; / p ＆GT;

...
<p>"We are a better country because of these commitments," he said. "I'll go further – we would not be a great country without [them]."</p>
<p>Liberals were mostly delighted by what the <em>Washington Post</em> called "the most ambitious defence [Obama] may ever have attempted of American liberalism and of what it means to be a Democrat".</p>
<p>This was the Obama many of them hoped for when they voted him into office on that wave of enthusiasm back in 2008.</p>
<p><em>Felicity Spector is a deputy programme editor for <a href="http://www.channel4.com/news/" onclick="window.open(this.href);return false;" onkeypress="window.open(this.href);return false;">Channel 4 News</a></em>.</p>

输出应为：

"We are a better country because of these commitments," he said. "I'll go further – we would not be a great country without [them]."
Liberals were mostly delighted by what the Washington Post called "the most ambitious defence [Obama] may ever have attempted of American liberalism and of what it means to be a Democrat".
This was the Obama many of them hoped for when they voted him into office on that wave of enthusiasm back in 2008.

因此，如果文本位于最后一个＆lt; p ＆gt;中，则需要获取除文本之外的所有内容。标签，＆lt; p ＆gt;之间没有其他文字和＆lt; em ＆gt;标签以及＆lt; / em ＆gt;之间没有其他文字和＆lt; / p ＆gt;如上例所示。

我正在使用

//p[normalize-space()]

但它会返回eveerything，包括最后一个标记中的文字＆lt; p ＆gt;＆lt; em ＆gt; ...＆lt; / em ＆GT;：

"We are a better country because of these commitments," he said. "I'll go further – we would not be a great country without [them]."
Liberals were mostly delighted by what the Washington Post called "the most ambitious defence [Obama] may ever have attempted of American liberalism and of what it means to be a Democrat".
This was the Obama many of them hoped for when they voted him into office on that wave of enthusiasm back in 2008.
Felicity Spector is a deputy programme editor for Channel 4 News

应该排除最后的遗产。

感谢任何提示。

UPD

示例1。如果下一个文本位于最后一个＆lt; p ＆gt;中，则应返回下一个文本。（因为并非所有文本都在＆lt; em ＆gt;：

内

<p>I was once on a travelling sanitation carnival in Uttar Pradesh when someone rushed up to me. “Rose! Rose! There’s a real ‘no loo no I do’!” If that’s the story of <em>Ek Prem Katha</em>, then I’m all in favour.  But because bringing a toilet into the world, when there are still 2.4 billion people without one, is by any reckoning a very happy ending.</p>

示例2。如果下一个文本位于最后一个＆lt; p ＆gt;中，则应不。（因为里面的所有文字＆lt; em ＆gt;）：

<p><em>Sophie Elmhirst is an assistant editor of the NS</em></p>

Answer 1

目前还不清楚规则到底是什么，但这里有一个建议，你可以评论：

//p[normalize-space() and not(position() = last() and em)]

转换为

//p                           find all `p` elements anywhere in the document
[normalize-space()            but only if the contain at least 1 character that is not a white-space
and not(position() = last()   and only if the `p` element is not the last `p` child of its parent
and em)]                      and only if the `p` element has no child named `em`

并返回结果（单个结果以-------分隔）：

<p>"We are a better country because of these commitments," he said. "I'll go further – we would not be a great country without [them]."</p>
-----------------------
<p>Liberals were mostly delighted by what the <em>Washington Post</em> called "the most ambitious defence [Obama] may ever have attempted of American liberalism and of what it means to be a Democrat".</p>
-----------------------
<p>This was the Obama many of them hoped for when they voted him into office on that wave of enthusiasm back in 2008.</p>

警告：如果文档的结构实际上更复杂，那么在某些地方可能会出错，例如，当p元素出现在层次结构中的任何位置时。

也许不需要normalize-space但我总是使用它。

如果您真的想要排除只有空格的元素，请仅使用它。

我需要排除最后一个p中的文本，只要整个文本包含在em
中

嗯，在您自己的最后一个p元素的示例中，其中em并非如此：最后.实际上在em之外，它是一个p的文本节点。

编辑对评论作出反应：

我只是发现，如果在最后一个p中有一个文本，那么xpath不会返回文本。我已经更新了一个例子

然后使用下面的路径表达式：

//p[normalize-space() and not(position() = last() and em and not(text()))]

抱歉，直到现在我可能误解了你。 not(text())考虑的是p之外是否存在em本身的子文字。

如果满足条件，XPath将从最后一个元素中排除文本

1 个答案: