htmlentites不为表情符号工作

时间:2016-01-22 21:22:37

标签: php html-entities emoji

我正在尝试显示字符html实体

echo htmlentities(htmlentities("&"));
//outputs &
echo htmlentities(htmlentities("<"));
//outputs &lt;

但它似乎不适用于表情符号

echo htmlentities(htmlentities(""));
//outputs 

如何输出&#128526;

<小时/> 编辑:

我正在尝试显示用户输入的所有html实体编码的字符串 echo htmlentities(htmlentities($input))

实施例: "this & that " -> "this &amp; that &#128526;"

3 个答案:

答案 0 :(得分:9)

这适用于常规HTML实体,UTF-8表情符号(以及其他utf内容)以及常规字符串。

我只是遇到空字符串值的问题,所以我不得不把这个条件放到函数中。

function entities( $string ) {
    $stringBuilder = "";
    $offset = 0;

    if ( empty( $string ) ) {
        return "";
    }

    while ( $offset >= 0 ) {
        $decValue = ordutf8( $string, $offset );
        $char = unichr($decValue);

        $htmlEntited = htmlentities( $char );
        if( $char != $htmlEntited ){
            $stringBuilder .= $htmlEntited;
        } elseif( $decValue >= 128 ){
            $stringBuilder .= "&#" . $decValue . ";";
        } else {
            $stringBuilder .= $char;
        }
    }

    return $stringBuilder;
}

// source - http://php.net/manual/en/function.ord.php#109812
function ordutf8($string, &$offset) {
    $code = ord(substr($string, $offset,1));
    if ($code >= 128) {        //otherwise 0xxxxxxx
        if ($code < 224) $bytesnumber = 2;                //110xxxxx
        else if ($code < 240) $bytesnumber = 3;        //1110xxxx
        else if ($code < 248) $bytesnumber = 4;    //11110xxx
        $codetemp = $code - 192 - ($bytesnumber > 2 ? 32 : 0) - ($bytesnumber > 3 ? 16 : 0);
        for ($i = 2; $i <= $bytesnumber; $i++) {
            $offset ++;
            $code2 = ord(substr($string, $offset, 1)) - 128;        //10xxxxxx
            $codetemp = $codetemp*64 + $code2;
        }
        $code = $codetemp;
    }
    $offset += 1;
    if ($offset >= strlen($string)) $offset = -1;
    return $code;
}

// source - http://php.net/manual/en/function.chr.php#88611
function unichr($u) {
    return mb_convert_encoding('&#' . intval($u) . ';', 'UTF-8', 'HTML-ENTITIES');
}

/* ---- */

var_dump( entities( "&" ) ) . "\n";
var_dump( entities( "<" ) ) . "\n";
var_dump( entities( "" ) ) . "\n";
var_dump( entities( "☚" ) ) . "\n";
var_dump( entities( "" ) ) . "\n";
var_dump( entities( "A" ) ) . "\n";
var_dump( entities( "Hello  world" ) ) . "\n";
var_dump( entities( "this & that " ) ) . "\n";

答案 1 :(得分:2)

$emoji = "\xF0\x9F\x98\x8E"; //你的表情符号

我从convert unicode to html entities hex

收到此回调
$hex = preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($m) {
    $char = current($m);
    $utf = iconv('UTF-8', 'UCS-4', $char);
    return sprintf("&#x%s;", ltrim(strtoupper(bin2hex($utf)), "0"));
}, $emoji);

echo $hex;

echo json_encode(("\xF0\x9F\x98\x8E")); //已解码。 htmlentities没有用它。

这样可以吗?

答案 2 :(得分:2)

htmlentities文档声明

  

具有HTML字符实体等价物的所有字符都是   翻译成这些实体。

您的表情符号没有&lt;的等效内容适用于<,因此它无法转换。 &#128526;只是一个HTML代码,而不是HTML实体。

function htmlEntitiesOrCode($string) {
    //try htmlentities first
    $result = htmlentities($string, ENT_COMPAT, "UTF-8");

    //if the output is different from input, an entity was returned
    if ($result != $string) {
        return $result;
    }

    //get the html code
    $offset = 0;
    $code = ord(substr($string, $offset,1));
    if ($code >= 128) {
        if ($code < 224) {
            $bytesnumber = 2;
        } else if ($code < 240) {
            $bytesnumber = 3;
        } else if ($code < 248) {
            $bytesnumber = 4;
        }
        $codetemp = $code - 192 - ($bytesnumber > 2 ? 32 : 0) - ($bytesnumber > 3 ? 16 : 0);
        for ($i = 2; $i <= $bytesnumber; $i++) {
            $offset ++;
            $code2 = ord(substr($string, $offset, 1)) - 128;
            $codetemp = $codetemp*64 + $code2;
        }
        $code = $codetemp;
    }
    $offset += 1;
    if ($offset >= strlen($string)) {
        $offset = -1;
    }

    $result = "&#" . $code;
    return $result;
}

HTML代码函数取自此处:http://php.net/manual/en/function.ord.php#109812