Question

给出想要从CLI PHP脚本输出的字符的Unicode十进制或十六进制数，PHP如何生成它？ chr()函数似乎没有生成正确的输出。这是我的测试脚本，使用Section Break字符U + 00A7（十六进制中的A7，十进制中的167，应该用UTF-8中的C2 A7表示）作为测试：

<?php
echo "Section sign: ".chr(167)."\n"; // Using CHR function
echo "Section sign: ".chr(0xA7)."\n";
echo "Section sign: ".pack("c", 0xA7)."\n"; // Using pack function?
echo "Section sign: §\n"; // Copy and paste of the symbol into source code

我得到的输出（通过SSH会话到服务器）是：

Section sign: ?
Section sign: ?
Section sign: ?
Section sign: §

因此，这证明我正在使用的终端字体中包含Section Break字符，并且SSH连接正在成功发送它，但chr()在构建它时没有正确构建它代号。

如果我所拥有的只是代号而不是复制/粘贴选项，我有哪些选择？

Answer 1

假设您有iconv，这是一种不涉及自己实施UTF-8的简单方法：

function unichr($i) {
    return iconv('UCS-4LE', 'UTF-8', pack('V', $i));
}

Answer 2

在排除mb_ functions和iconv时，PHP不了解Unicode。你必须自己用UTF-8编码这个角色。

为此，维基百科对UTF-8的结构有一个excellent overview。这是基于该文章的快速，肮脏和未经测试的功能：

function codepointToUtf8($codepoint)
{
    if ($codepoint < 0x7F) // U+0000-U+007F - 1 byte
        return chr($codepoint);
    if ($codepoint < 0x7FF) // U+0080-U+07FF - 2 bytes
        return chr(0xC0 | ($codepoint >> 6)).chr(0x80 | ($codepoint & 0x3F);
    if ($codepoint < 0xFFFF) // U+0800-U+FFFF - 3 bytes
        return chr(0xE0 | ($codepoint >> 12)).chr(0x80 | (($codepoint >> 6) & 0x3F).chr(0x80 | ($codepoint & 0x3F);
    else // U+010000-U+10FFFF - 4 bytes
        return chr(0xF0 | ($codepoint >> 18)).chr(0x80 | ($codepoint >> 12) & 0x3F).chr(0x80 | (($codepoint >> 6) & 0x3F).chr(0x80 | ($codepoint & 0x3F);
}

Answer 3

不要忘记UTF-8是一种可变长度编码。

§未包含在UTF-8能够在一个字节中显示的前128个（ASCII）字符中。 §是UTF-8中的多字节字符，前缀为c2字节，表示first byte of a two-byte sequence.。这应该有效：

echo "Section sign: ".chr(0xC2).chr(0xA7)."\n";

Answer 4

chr

(PHP 4, PHP 5)

chr — Return a specific character

Report a bug
 Description

string chr ( int $ascii )
Returns a one-character string containing the character specified by ascii.

此功能补充了ord（）。

重要的是ascii这个词:) 试试这个：

  function uchr ($codes) {
        if (is_scalar($codes)) $codes= func_get_args();
        $str= '';
        foreach ($codes as $code) $str.= html_entity_decode('&#'.$code.';',ENT_NOQUOTES,'UTF-8');
        return $str;
    }
    echo "Section sign: ".uchr(167)."\n"; // Using CHR function
    echo "Section sign: ".uchr(0xA7)."\n";

Answer 5

我知道我正在重新打开一个旧的，已解决的问题，但是因为我偶然发现了寻找帮助的主题，我想我会分享我最终解决的问题。提出问题的初始人可能有兴趣重构他/她的代码。

手动重新编程ascii-to-unicode就像重新发明轮子，而不是谈论错误/性能潜力。

我找到的最佳解决方案是使用：

pack从输入数据创建值，使用适当的代码来获取正确数量的数据，通常pack("H*", <input data>)来读取十六进制值
mb_convert_encoding使用mb_convert_encoding(<ASCII string>, "UTF-8")将ASCII字符串转换为unicode字符串。如果未正确识别输入字符串，则此函数的第三个参数允许指定输入编码

PHP构造一个Unicode字符串？

5 个答案: