将编码的html实体转换为utf-8

时间:2017-11-10 13:02:55

标签: php utf-8 xss

如何将此字符串转换为UTF-8:

  

&安培;#0000106&安培;#0000097&安培;#0000118&安培;#0000097&安培;#0000115&安培;#0000099&安培;#0000114&安培;#0000105&安培;#0000112&安培;#0000116&安培;#0000058&安培;#0000097&安培;#0000108&安培;#0000101&安培;#0000114&安培; #0000116&安培;#0000040&安培;#0000039&安培;#0000088&安培;#0000083&安培;#0000083&安培;#0000039&安培;#0000041

我想转换这个:

  

&安培;#X6A&安培;#X61&安培;#X76&安培;#X61&安培;#X73&安培;#X63&安培;#X72&安培;#X69&安培;#X70&安培;#X74&安培;#X3A&安培;#X61&安培;#X6C&安培;#X65&安培;#X72&安培; #X74&安培;#X28&安培;#X27&安培;#X58&安培;#X53&安培;#X53&安培;#X27&安培;#X29

我想阻止XSS攻击,我将此文章用作备忘单https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet

我的策略是将上面的字符串转换为UTF-8并检查它是否包含javascript。

1 个答案:

答案 0 :(得分:1)

我创建了一个简单的函数来获取可能的HTML,请检查:

$decimalHTML = '&#0000106&#0000097&#0000118&#0000097&#0000115&#0000099&#0000114&#0000105&#0000112&#0000116&#0000058&#0000097&#0000108&#0000101&#0000114&#0000116&#0000040&#0000039&#0000088&#0000083&#0000083&#0000039&#0000041';
$hexHTML = '&#x6A&#x61&#x76&#x61&#x73&#x63&#x72&#x69&#x70&#x74&#x3A&#x61&#x6C&#x65&#x72&#x74&#x28&#x27&#x58&#x53&#x53&#x27&#x29';

function getDecimalHTML($str) {
    return str_replace(
        '&#',
        '',
        preg_replace_callback(
            '/\d+/',
            function($v) {
                return str_replace(';', '', implode(array_map('chr', $v)));
            }, $str
        )
    );
}

function getHexDecimalHTML($str) {
    return str_replace(
        array('&#', 'x'),
        '',
        preg_replace_callback(
            '/(?<=x)\w+/',
            function($v) {
                return str_replace(';', '', implode(array_map('hex2bin', $v)));
            },
            $str
        )
    );
}

echo getDecimalHTML($decimalHTML) . "\n";
echo getHexDecimalHTML($hexHTML);

告诉我:

javascript:alert('XSS')
javascript:alert('XSS') 

我使用chr从ASCII和hex2bin获取de char以从十六进制代码中获取字符串....

我建议不要重新发明轮子并使用适合您的库,它们涵盖了此问题的所有方面,例如AntiXSS