我正在尝试编写一个preg_replace,它将清除允许标记的所有标记属性,以及允许列表中不存在的所有标记。
基本示例 - 这:
<p style="some styling here">Test<div class="button">Button Text</div></p>
原来是:
<p>test</p>
我运行良好..除了img标签和href标签。我不需要清理img和标签的属性。可能是其他人。我不确定是否有办法设置两个允许列表?
1)清单后允许留下标签的一个清单
2)一个允许但只留下标签的列表?
3)其余部分被删除。
以下是我正在处理的脚本:
$string = '<p style="width: 250px;">This is some text<div class="button">This is the button</div><br><img src="waves.jpg" width="150" height="200" /></p><p><b>Title</b><br>Here is some more text and <a href="#" target="_blank">this is a link</a></p>';
$output = strip_tags($string, '<p><b><br><img><a>');
$output = preg_replace("/<([a-z][a-z0-9]*)[^>]*?(\/?)>/i", '<$1$2$3$4$5>', $output);
echo $output;
此脚本应将$ string清除为:
<p>This is some text<br><img src="waves.jpg" width="150" height="200" /></p><p><b>Title</b><br>Here is some more text and <a href="#" target="_blank">this is a link</a></p>
答案 0 :(得分:1)
此函数将剥离不允许的子元素元素,清除其“剥离”子元素,并保留其余元素(递归)。
function clean($element, $allowed, $stripped){
if(!is_array($allowed) || ! is_array($stripped)) return;
if(!$element)return;
$toDelete = array();
foreach($element->childNodes as $child){
if(!isset($child->tagName))continue;
$n = $child->tagName;
if ($n && !in_array($n, $allowed) && !in_array($n, $stripped)){
$toDelete[] = $child;
continue;
}
if($n && in_array($n, $stripped)){
$attr = array();
foreach($child->attributes as $a)
$attr[] = $a->nodeName;
foreach($attr as $a)
$child->removeAttribute($a);
}
clean($child, $allowed, $stripped);
}
foreach ($toDelete as $del)
$element->removeChild($del);
}
这是清理字符串的代码:
$xhtml = '<p style="width: 250px;">This is some text<div class="button">This is the button</div><br><img src="waves.jpg" width="150" height="200" /></p><p><b>Title</b><br>Here is some more text and <a href="#" target="_blank">this is a link</a></p>';
$dom = new DOMDocument();
$dom->loadHTML($xhtml);
$body = $dom->getElementsByTagName('body')->item(0);
clean($body, array('img', 'a'), array('p', 'br', 'b'));
echo preg_replace('#^.*?<body>(.*?)</body>.*$#s', '$1', $dom->saveHTML($body));
您应该查看PHP's DOM classes
的文档