想要创建一个脚本,该脚本将自动从html标签(开始和结束)获取内容并将其存储到数组中。
示例:
输入:
$str = <p>This is a sample <b>text</b> </p> this is out of tags.<p>This is <p>another text</p>for same aggregate <i>tags</i>.</p>
输出:
$blocks[0] = <p>This is a sample <b>text</b> </p>
$blocks[1] = <p>This is <p>another text</p>for same aggregate <i>tags</i>.</p>
NB: the first block start with <p> so must be stop at </p>, the second block again start with <p> but it has another start and end paragraph[<p></p>] between this, and stop when find </p> . That means i want to put all of the data and inner tags between start and end tags.
答案 0 :(得分:0)
我会尝试为此提供答案,尽管此解决方案并未准确提供您所需的内容,因为嵌套的<p>
标记不是有效的HTML。使用PHP的DOMDocument,您可以像这样提取段落标记。
<?php
$test = "<p>This is a sample <b>text</b> </p> this is out of tags.<p>This is <p>another text</p>for same aggregate <i>tags</i>.</p>";
$html = new DOMDocument();
$html->loadHTML($test);
$p_tags = array();
foreach ($html->getElementsByTagName('p') as $p) {
$p_tags[] = $html->saveHTML($p);
}
print_r($p_tags);
?>
由于无效的标记嵌套而向您发出一些警告后,输出应如下所示:
Array
(
[0] => <p>This is a sample <b>text</b> </p>
[1] => <p>This is </p>
[2] => <p>another text</p>
)
答案 1 :(得分:0)
您可以使用Simple Html Dom库来执行此操作。这是一个例子。
require_once('simple_html_dom.php');
$html = " <p>This is a sample <b>text</b> </p> this is out of tags.<p>This is <p>another text</p>for same aggregate <i>tags</i>.</p>";
$html = str_get_html($html);
$p = $html->find('p');
$contentArray = array();
foreach($p as $element)
$contentArray[] = $element->innertext; //You can try $element->outertext to get the output with tag. ie. <p>content</p>
print_r($contentArray);
你的输出是这样的:
Array
(
[0] => This is a sample <b>text</b>
[1] => This is
[2] => another text
)