如何在PHP中打印十六进制UTF-8字符

时间:2016-06-19 21:38:53

标签: php encoding utf-8

如何从十六进制UTF-8值中打印UFT-8字符?我看了this帖子,但它没有解决我的问题......

我处理许多字符串,这些字符串是存储在数据库中的梵文单词。我有他们的HTML值,16位二进制代码点,十六进制代码和十进制代码,但我希望能够使用他们的十六进制UTF-8 值并输出它们的符号形式。

例如,这是一个单词आम,其二进制UTF-8值为111000001010010010111000111000001010010010101110。我想查看/存储/打印其十六进制UTF-8值并打印其符号形式。

例如,这是我的代码片段:

$BinaryUTF8 = "111000001010010010000110111000001010010010101110";

$Temporary = dechex(bindec($BinaryUTF8));

$HexadecimalUTF8 = NULL;

for($i = 0; $i < strlen($Temporary); $i+=2)
{
    $HexadecimalUTF8 .= "\x".$Temporary[$i].$Temporary[$i+1];
}

$Test = "\xe0\xa4\x86\xe0\xa4\xae";

echo "\$Test = ".$Test;

echo "<br>";

echo "\$HexadecimalUTF8 = ".$HexadecimalUTF8;

输出结果为:

$Test = आम
$HexadecimalUTF8 = \xe0\xa4\x86\xe0\xa4\xae

$测试输出所需的字符。

为什么$ HexadecimalUTF8没有输出所需的字符?

2 个答案:

答案 0 :(得分:2)

您的二进制文件错误(我已修复它)

你正在制作一个包含文字&#34; \ xe0&#34;的字符串。而不是代表它的字符,十六进制只是一个数字。

这似乎现在有效

<?php
$BinaryUTF8 = "111000001010010010000110111000001010010010101110";

$Temporary = dechex(bindec($BinaryUTF8));

$HexadecimalUTF8 = NULL;

for($i = 0; $i < strlen($Temporary); $i+=2)
{
    $HexadecimalUTF8 .= '\x' . $Temporary[$i].$Temporary[$i+1];
}

$Test = "\xe0\xa4\x86\xe0\xa4\xae";

echo "\$Test = ".$Test;

echo "<br>";
echo "\$HexadecimalUTF8 = " . makeCharFromHex($HexadecimalUTF8);

function makeCharFromHex($hex) {
    return preg_replace_callback(
        '#(\\\x[0-9A-F]{2})#i',
        function ($matches) {

            return chr(hexdec($matches[1]));
        },
        $hex
    );
}

这个问题让我想起PHP对于多字节支持有多糟糕

答案 1 :(得分:1)

要从十进制值打印UTF-8字符,您可以使用此功能

<?php

function chr_utf8($n,$f='C*'){
return $n<(1<<7)?chr($n):($n<1<<11?pack($f,192|$n>>6,1<<7|191&$n):
($n<(1<<16)?pack($f,224|$n>>12,1<<7|63&$n>>6,1<<7|63&$n):
($n<(1<<20|1<<16)?pack($f,240|$n>>18,1<<7|63&$n>>12,1<<7|63&$n>>6,1<<7|63&$n):'')));
}

echo chr_utf8(9405).chr_utf8(9402).chr_utf8(9409).chr_utf8(hexdec('24C1')).chr_utf8(9412);

// Output ⒽⒺⓁⓁⓄ

// Note : Use hexdec to print UTF-8 encoded characters from hexadecimal number.

对于您的代码段,您可以尝试此操作...并在https://eval.in/748161

中查看
<?php

// function chr_utf8 shown above is required…

$BinaryUTF8 = "111000001010010010000110111000001010010010101110";

if (preg_match_all('#(0[01]{7})|(?:110([01]{5})10([01]{6}))|(?:1110([01]{4})10([01]{6})10([01]{6}))|(?:11110([01]{3})10([01]{6}),10([01]{6})10([01]{6}))#',$BinaryUTF8,$a,PREG_SET_ORDER))
$result=implode('',array_map(function($n){return chr_utf8(bindec(implode('',array_slice($n,1))));},$a));

echo $result;

// Output आम

// Note : If you work with "binary" the length of input must be multiple of 8.
// You can't remove leading zeros because this regex will not detect the character…

另一个不错的内联解决方案如下......( php v5.6 + required )在https://eval.in/748162中查看

<?php

$BinaryUTF8 = "111000001010010010000110111000001010010010101110";
echo pack('C*',...array_map('bindec',str_split($BinaryUTF8,8)));

// Output आम

// Note : Length or $BinaryUTF8 of input must be multiple of 8.