Question

我在我的网站上将评论视为评论。在哪里我想要几个html标签允许像

  <h2>, <h3>, so on. . .

很少有人禁止。

但我也在使用一个函数来检查字符串的一部分并用表情符号替换它让我们说'＆lt; 3'代表心脏，'：D'代表lol

当我使用跟随

的函数sanitizeHTML（）时

public function sanitizeHTML($inputHTML, $allowed_tags = array('<h2>', '<h3>', '<p>', '<br>', '<b>', '<i>', '<a>', '<ul>', '<li>', '<blockquote>', '<span>', '<code>', '<img>')) {
    $_allowed_tags = implode('', $allowed_tags);
    $inputHTML = strip_tags($inputHTML, $_allowed_tags);
    return preg_replace('#<(.*?)>#ise', "'<' . $this->removeBadAttributes('\${1}1') . '>'", $inputHTML);
}

function removeBadAttributes($inputHTML) {
    $bad_attributes = 'onerror|onmousemove|onmouseout|onmouseover|' . 'onkeypress|onkeydown|onkeyup|javascript:';
    return stripslashes(preg_replace("#($bad_attributes)(\s*)(?==)#is", 'SANITIZED ', $inputHTML));
}

它删除了不良属性并仅允许有效标记，但是当字符串为＆lt; 3 for heart时，此函数会删除＆lt; 3之后的字符串部分。

注意：

没有html特殊字符的表情符号代码＆lt;或者＆gt;签署工作正常。

Answer 1

你正在使用PCRE来解析html，这绝不是一个好主意。表达式<(.*?)>将匹配从<到下一个>的所有内容。你需要更像<[^>]+>的东西。但是，这仍有问题（并将捕获<3）。您可以使用负前瞻（<(?!3)[^>]+>）来处理该特定情况，但还有许多其他情况需要考虑。您可能需要考虑使用DOM解析器。

我的php函数strip_tags根据我的期望不起作用

1 个答案: