Question

试图确定是否有任何P标签文本完全在强标签/ B标签之内

// Match (unacceptable, flag to user):
<p><strong>Any text and <span>maybe</span> other <em>tags</em></strong></p>
// Don't match (acceptable):
<p>Any text and <strong>maybe</strong> other <em>tags</em></p>

Answer 1

任何p ...
//p
具有至少一个strong后代节点...
//p[.//strong]
除了空白以外还具有一些文本内容...
//p[.//strong[normalize-space(.) != ""]]

，并且没有文本节点的后代，其内容没有strong祖先节点：

//p[
  .//strong[normalize-space(.) != ""] and 
  not(.//text()[normalize-space(.) != "" and not(ancestor::strong)])
]

这将检查两个条件。首先，该段落的某些实际内容位于strong内，而没有实际的内容不在strong内-换句话说，单词内容的格式不同。

示例：

$html = <<<'HTML'
<p><strong>Any text and <span>maybe</span> other <em>tags</em></strong></p>
<p>Any text and <strong>maybe</strong> other <em>tags</em></p>
<p><strong>Builder's</strong> <strong>tea</strong></p>
<p><em><strong>Builder's</strong> <strong> tea</strong></em></p>
HTML;

$document = new DOMDocument();
$document->loadHTML($html);
$xpath = new DOMXpath($document);

$expression = 
  '//p[
      .//strong[normalize-space(.) != ""] and 
      not(.//text()[normalize-space(.) != "" and not(ancestor::strong)])
    ]';

foreach ($xpath->evaluate($expression) as $p) {
  var_dump(
    $document->saveXml($p)
  );
}

输出：

string(75) "<p><strong>Any text and <span>maybe</span> other <em>tags</em></strong></p>" 
string(54) "<p><strong>Builder's</strong> <strong>tea</strong></p>" 
string(64) "<p><em><strong>Builder's</strong> <strong> tea</strong></em></p>"

表达式也可以扩展为涵盖b：

//p[
   (
     .//strong[normalize-space(.) != ""] or
     .//b[normalize-space(.) != ""]
   ) and 
   not(
     .//text()[
       normalize-space(.) != "" and 
       not(ancestor::*[self::strong or self::b])
     ]
   )
]

Answer 2

以下代码检查P标签在任何Strong标签之前和之后是否不包含文本或其他HTML标签，从而确定P标签完全是粗体（强）。

$false_headings = $xpath->query("//p/strong");

foreach ($false_headings as $heading) {
    if ($heading->previousSibling === null and $heading->nextSibling === null) {
        // Report to user 
        break;
    }
}

Answer 3

这是一种方法，部分基于@gangabass的建议。它计算仅包含单个<p>元素的<strong>元素，这些元素可选地仅由空白文本包围。

$unacceptableNodesCount = $xpath->evaluate( 'count(//p[count(*) = 1 and name(*) = "strong" and normalize-space() = string(strong)])' );

var_dump( $unacceptableNodesCount );

但是，老实说，如果目标是防止用户仅使用粗体文本并且确定了用户，则他们可能会找到一种方法。例如，将<strong>元素用Unicode空格字符或类似的字符括起来。

Answer 4

您的问题描述表明您也想抓住

<p><strong>Builder's</strong><strong> tea</strong></p>

也许还有

<p><strong>Builder's</strong> <strong>tea</strong></p>

其中一些建议的解决方案没有解决此问题。

但是不清楚您是否还想捉住

<p><emph><strong>Builder's</strong> <strong> tea</strong></emph></p>

我认为XPath 2.0中与“任何P标签文本完全在强/ B标签内”最接近的是

//p[empty(.//text()[normalize-space()] except .//strong//text()])]

选择所有不具有非白色后代文本节点的p个元素，该节点不是p中强元素的后代。

我无法立即在XPath 1.0中看到实现此目的的方法，但是我的XPath 1.0十分生锈。

xpath匹配第一个和最后一个孩子

4 个答案: