我准备了一个允许样式的白名单,我想从HTML字符串中删除白名单中的所有样式
$allowed_styles = array('font-size','color','font-family','text-align','margin-left');
$html = 'xyz html';
$html_string = '<bdoy>' . $html . '<body>';
$dom = new DOMDocument();
$dom->loadHTML($html_string);
$elements = $dom->getElementsByTagName('body');
foreach($elements as $element) {
foreach($element->childNodes as $child) {
if($child->hasAttribute('style')) {
$style = strtolower(trim($child->getAttribute('style')));
//match and get only the CSS Property name
preg_match_all('/(?<names>[a-z\-]+):/', $style, $matches);
for($i=0;$i<sizeof($matches["names"]);$i++) {
$style_property = $matches["names"][$i];
// if the css-property is not in allowed styles array
// then remove the whole style tag from this child
if(!in_array($style_property,$allowed_styles)) {
$child->removeAttribute('style');
continue;
}
}
}
}
}
$dom->saveHTML();
$html_output = $dom->getElementsByTagName('body');
我测试了很多html字符串,它在每个地方都可以正常工作。但是当我试图过滤这个html字符串时
$html_string = '<div style="font-style: italic; text-align: center;
background-color: red;">On The Contrary</div><span
style="font-style: italic; background-color: rgb(244, 249, 255);
font-size: 32px;"><b style="text-align: center;
background-color: rgb(255, 255, 255);">This is USA</b></span>';
除此行外,此字符串中除去所有其他不允许的样式
<b style="text-align: center; background-color: rgb(255, 255, 255);">
有人可以告诉我任何其他有效且强大的方法来删除白名单以外的样式
答案 0 :(得分:1)
对于这个(和其他嵌套的)html,你必须使用这样的递归函数:
$html = 'your html';
$allowed_styles = array('font-size','color','font-family','text-align','margin-left');
$html_string = '<body>' . $html . '</body>';
$dom = new DOMDocument();
$dom->loadHTML($html_string);
$elements = $dom->getElementsByTagName('body');
foreach ($elements as $element)
clearHtml($element, $allowed_styles);
$html_output = $dom->saveHTML();
function clearHtml($tree, $allowed_styles) {
if ($tree->nodeType != XML_TEXT_NODE) {
if ($tree->hasAttribute('style')) {
$style = strtolower(trim($tree->getAttribute('style')));
preg_match_all('/(?<names>[a-z\-]+):/', $style, $matches);
for($i = 0; $i < sizeof($matches['names']); $i++) {
$style_property = $matches['names'][$i];
if(!in_array($style_property, $allowed_styles)) {
$tree->removeAttribute('style');
continue;
}
}
}
if ($tree->childNodes)
foreach ($tree->childNodes as $child)
clearHtml($child, $allowed_styles);
}
}
答案 1 :(得分:1)
类似于Oleja解决方案,但是此解决方案仅删除了不允许的属性,而不删除了整个样式属性。
//$this->removeStylesheet($doc, ['color','font-weight']);
function removeStylesheet($tree, $allowed_styles) {
if ($tree->nodeType != XML_TEXT_NODE) {
if ($tree->hasAttribute('style')) {
$style = strtolower(trim($tree->getAttribute('style')));
preg_match_all('/(?<names>[a-z\-]+) *:(?<values>[^\'";]+)/', $style, $matches);
$new_styles = array();
for ($i=0; $i<sizeof($matches['names']); $i++) {
if(in_array($matches['names'][$i], $allowed_styles)) {
$new_styles[] = $matches['names'][$i].':'.$matches['values'][$i];
}
}
if ($new_styles)
$tree->setAttribute('style', implode(';', $new_styles));
else
$tree->removeAttribute('style');
}
if ($tree->childNodes) {
foreach ($tree->childNodes as $child) {
$this->removeStylesheet($child, $allowed_styles);
}
}
}
}