Question

假设我有这段代码：

<p dataname="description">
Hello this is a description. <a href="#">Click here for more.</a>
</p>

如何选择p的nodeValue但排除a及其内容？

我目前的代码：

$result = $xpath->query("//p[@dataname='description'][not(self::a)]");

我按$result->item(0)->nodeValue;

选择它

Answer 1

简单地将/ text（）附加到您的查询应该可以做到这一点

$result = $xpath->query("//p[@dataname='description'][not(self::a)]/text()");

Answer 2

不确定PHP的XPath是否支持此功能，但是这个XPath在Scrapy（基于Python的抓取框架）中为我做了诀窍：

$xpath->query('//p[@dataname='description']/text()[following-sibling::a]')

如果这不起作用，请尝试Kristoffers解决方案，或者您也可以使用正则表达式解决方案。例如：

$output = preg_replace("~<.*?>.*?<.*?>~msi", '', $result->item(0)->nodeValue);

这将删除包含其中任何内容的任何HTML标记，不包括未由HTML标记封装的文本。