如何从十六进制UTF-8值中打印UFT-8字符?我看了this帖子,但它没有解决我的问题......
我处理许多字符串,这些字符串是存储在数据库中的梵文单词。我有他们的HTML值,16位二进制代码点,十六进制代码和十进制代码,但我希望能够使用他们的十六进制UTF-8 值并输出它们的符号形式。
例如,这是一个单词आम
,其二进制UTF-8值为111000001010010010111000111000001010010010101110
。我想查看/存储/打印其十六进制UTF-8值并打印其符号形式。
例如,这是我的代码片段:
$BinaryUTF8 = "111000001010010010000110111000001010010010101110";
$Temporary = dechex(bindec($BinaryUTF8));
$HexadecimalUTF8 = NULL;
for($i = 0; $i < strlen($Temporary); $i+=2)
{
$HexadecimalUTF8 .= "\x".$Temporary[$i].$Temporary[$i+1];
}
$Test = "\xe0\xa4\x86\xe0\xa4\xae";
echo "\$Test = ".$Test;
echo "<br>";
echo "\$HexadecimalUTF8 = ".$HexadecimalUTF8;
输出结果为:
$Test = आम
$HexadecimalUTF8 = \xe0\xa4\x86\xe0\xa4\xae
$测试输出所需的字符。
为什么$ HexadecimalUTF8没有输出所需的字符?
答案 0 :(得分:2)
您的二进制文件错误(我已修复它)
你正在制作一个包含文字&#34; \ xe0&#34;的字符串。而不是代表它的字符,十六进制只是一个数字。
这似乎现在有效
<?php
$BinaryUTF8 = "111000001010010010000110111000001010010010101110";
$Temporary = dechex(bindec($BinaryUTF8));
$HexadecimalUTF8 = NULL;
for($i = 0; $i < strlen($Temporary); $i+=2)
{
$HexadecimalUTF8 .= '\x' . $Temporary[$i].$Temporary[$i+1];
}
$Test = "\xe0\xa4\x86\xe0\xa4\xae";
echo "\$Test = ".$Test;
echo "<br>";
echo "\$HexadecimalUTF8 = " . makeCharFromHex($HexadecimalUTF8);
function makeCharFromHex($hex) {
return preg_replace_callback(
'#(\\\x[0-9A-F]{2})#i',
function ($matches) {
return chr(hexdec($matches[1]));
},
$hex
);
}
这个问题让我想起PHP对于多字节支持有多糟糕
答案 1 :(得分:1)
要从十进制值打印UTF-8字符,您可以使用此功能
<?php
function chr_utf8($n,$f='C*'){
return $n<(1<<7)?chr($n):($n<1<<11?pack($f,192|$n>>6,1<<7|191&$n):
($n<(1<<16)?pack($f,224|$n>>12,1<<7|63&$n>>6,1<<7|63&$n):
($n<(1<<20|1<<16)?pack($f,240|$n>>18,1<<7|63&$n>>12,1<<7|63&$n>>6,1<<7|63&$n):'')));
}
echo chr_utf8(9405).chr_utf8(9402).chr_utf8(9409).chr_utf8(hexdec('24C1')).chr_utf8(9412);
// Output ⒽⒺⓁⓁⓄ
// Note : Use hexdec to print UTF-8 encoded characters from hexadecimal number.
对于您的代码段,您可以尝试此操作...并在https://eval.in/748161
中查看<?php
// function chr_utf8 shown above is required…
$BinaryUTF8 = "111000001010010010000110111000001010010010101110";
if (preg_match_all('#(0[01]{7})|(?:110([01]{5})10([01]{6}))|(?:1110([01]{4})10([01]{6})10([01]{6}))|(?:11110([01]{3})10([01]{6}),10([01]{6})10([01]{6}))#',$BinaryUTF8,$a,PREG_SET_ORDER))
$result=implode('',array_map(function($n){return chr_utf8(bindec(implode('',array_slice($n,1))));},$a));
echo $result;
// Output आम
// Note : If you work with "binary" the length of input must be multiple of 8.
// You can't remove leading zeros because this regex will not detect the character…
另一个不错的内联解决方案如下......( php v5.6 + required )在https://eval.in/748162中查看
<?php
$BinaryUTF8 = "111000001010010010000110111000001010010010101110";
echo pack('C*',...array_map('bindec',str_split($BinaryUTF8,8)));
// Output आम
// Note : Length or $BinaryUTF8 of input must be multiple of 8.