Question

我有一个PHP变量，它包含一些HTML，我希望能够将变量分成两部分，并且我希望在找到第二个粗体<strong> or <b>时发生溢出，基本上如果我有内容看起来像这样，

我的内容
这是我的内容。 一些更加粗体的内容，会泄漏到另一个变量中。

这是可能的吗？

Answer 1

这样的事情基本上可行：

preg_split('/(<strong>|<b>)/', $html1, 3, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

给出你的测试字符串：

$html1 = '<strong>My content</strong>This is my content.<b>Some more bold</b>content';

你最终会

Array (
    [0] => <strong>
    [1] => My content</strong>This is my content.
    [2] => <b>
    [3] => Some more bold</b>content
)

现在，如果你的样本字符串没有以strong / b开头：

$html2 = 'like the first, but <strong>My content</strong>This is my content.<b>Some more bold</b>content, has some initial none-tag content';

Array (
    [0] => like the first, but 
    [1] => <strong>
    [2] => My content</strong>This is my content.
    [3] => <b>
    [4] => Some more bold</b>content, has some initial none-tag content
)

和一个简单的测试，看看元素＃0是标签还是文本，以确定“第二个标签和向前”文本的开始位置（元素＃3或元素＃4）

Answer 2

在正则表达式中可以使用'正向lookbehind'。例如，(?<=a)b与b中的b（仅cab}匹配，但与bed或debt不匹配。

在您的情况下，(?<=(\<strong|\<b)).*(\<strong|\<b)应该可以解决问题。在preg_split()来电中使用此正则表达式，如果您希望包含这些代码PREG_SPLIT_DELIM_CAPTURE或<b>，请务必设置<strong>。

Answer 3

如果你真的需要拆分字符串，那么正则表达式方法可能会起作用。但是，解析HTML有许多脆弱性。

如果您只想知道具有strong或b标记的第二个节点，则使用DOM会更加容易。代码不仅非常明显，所有解析位都会为您处理。

<?php

$testHtml = '<p><strong>My content</strong><br>
This is my content. <strong>Some more bold</strong> content, that would spilt into another variable.</p>
<p><b>This should not be found</b></p>';

$htmlDocument = new DOMDocument;

if ($htmlDocument->loadHTML($testHtml) === false) {
  // crash and burn
  die();
}

$xPath = new DOMXPath($htmlDocument);
$boldNodes = $xPath->query('//strong | //b');

$secondNodeIndex = 1;

if ($boldNodes->item($secondNodeIndex) !== null) {
  $secondNode = $boldNodes->item($secondNodeIndex);
  var_dump($secondNode->nodeValue);
} else {
  // crash and burn
}

当找到HTML元素时，PHP会拆分内容

3 个答案: