如何通过包含至少10个单词的p / div / h1-6标签分隔HTML

时间:2014-05-23 13:13:36

标签: php html

我需要通过p / div / h1-6标签将字符串与html分开,但以防万一,标签itlef包含至少10/20/30个单词。

示例:

1)

$string = '<div class="main"><span>a b c d</span><div><div><div><h3>a b c d e f g h i j</h3><div><p>In this paragraph <b>is not more than</b> ten words.</p></div></div></div></div></div>';

$output[0] = '<div class="main"><span>a b c d</span><div><div><div><h3>a b c d e f g h i j</h3>'; // only in tag h3 is at least 10 words
$output[1] = '<div><p>In this paragraph <b>is not more than</b> ten words.</p></div></div></div></div></div>'; // here is not more than 10 words in tag p, it is just rest of string

2)

$string = '<div class="main"><span>a b c d e f g h i j</span><div><div><div><h3>a b c d e f g h i j</h3><div><p>In this paragraph <b>is more than</b> ten words just now, right?.</p></div></div></div></div></div>';

$output[0] = '<div class="main"><span>a b c d e f g h i j</span>';
$output[1] = '<div><div><div><h3>a b c d e f g h i j</h3>';
$output[2] = '<div><p>In this paragraph <b>is more than</b> ten words just now, right?.</p></div></div></div></div></div>';

3)

$string = '<div class="main"><span>a b c d e f g h i j</span><div><div><div><h3>a b c d e f g h i j</h3><div><p>In this paragraph <b>is more than</b> ten words <p>Another paragraph with more than 10 words words words words words words</p> just now, right?.</p></div></div></div></div></div>';


$output[0] = '<div class="main"><span>a b c d e f g h i j</span>';
$output[1] = '<div><div><div><h3>a b c d e f g h i j</h3>';
$output[2] = 'In this paragraph <b>is more than</b> ten words <p>Another paragraph with more than 10 words words words words words words</p>';
$output[3] = ' just now, right?.</p></div></div></div></div></div>';

我只需要将带有html的长字符串分割成较小的部分,但这些较小的部分必须有意义(因为上下文)。 (单词部分太小,句子不可靠(用点,逗号,短划线等分开是不可靠的 - 例如,点不代表句末,对吗?)

0 个答案:

没有答案