使用mbstring将PHP引用转换为PHP中的UTF-8字符

时间:2011-11-16 16:45:53

标签: php encoding utf-8 character-encoding mbstring

我在数据库中有一组数据,这些数据已经输入了unicode字符,但它们被解释为一个字符串。也就是说,应该有一个撇号我实际上有\u2019

所以我现在需要将其转换为字符表示形式,即。首先,很容易将字符串更改为其实体版本:’,然后我需要将其转换为正确的UTF-8多字节字符串。

我试图以多种方式做到这一点;在我的本地服务器上,我可以使用preg_match函数来提取字符,然后将每个字符传递给以下函数:

mb_convert_encoding($string, "UTF-8", "HTML-ENTITIES");

听起来很明智,而且没有问题。在浏览器中关闭UTF-8字符集显示,当通过浏览器默认编码读取时,它实际上已转换为’

但是,在我的生产环境中运行时完全相同的代码会在呈现为UTF-8时生成可怕的“缺失符号”框。关闭UTF-8并生成任何字节流呈现为ò°‘£。它似乎输出4个字节而不是3个,我不知道这是否相关,因为我没有很好地阅读字符编码。

我认为问题在于我的mbstring设置。以下是我本地服务器的mbstring设置:

Multibyte Support   enabled
Multibyte string engine libmbfl
HTTP input encoding translation disabled
Multibyte (japanese) regex support  enabled
Multibyte regex (oniguruma) version 4.7.1
mbstring.detect_order   no value    no value
mbstring.encoding_translation   Off Off
mbstring.func_overload  0   0
mbstring.http_input auto    auto
mbstring.http_output    UTF-8   UTF-8
mbstring.http_output_conv_mimetypes ^(text/|application/xhtml\+xml)^(text/|application/xhtml\+xml)
mbstring.internal_encoding  UTF-8   UTF-8
mbstring.language   neutral neutral
mbstring.strict_detection   Off Off
mbstring.substitute_character   no value    no value

我的生产环境存在一些差异:

Multibyte Support   enabled
Multibyte string engine libmbfl
Multibyte (japanese) regex support  enabled
Multibyte regex (oniguruma) version 3.7.1
mbstring.detect_order   no value    no value
mbstring.encoding_translation   Off Off
mbstring.func_overload  0   0
mbstring.http_input auto    auto
mbstring.http_output    UTF-8   UTF-8
mbstring.internal_encoding  UTF-8   UTF-8
mbstring.language   neutral neutral
mbstring.strict_detection   Off Off
mbstring.substitute_character   no value    no value

任何人都能看到我做错了什么?

2 个答案:

答案 0 :(得分:1)

我想您正在寻找的是ordchr的多字节版本。

我为此写了以下polyfill

if (!function_exists('mb_internal_encoding')) {
    function mb_internal_encoding($encoding = NULL) {
        return ($from_encoding === NULL) ? iconv_get_encoding() : iconv_set_encoding($encoding);
    }
}

if (!function_exists('mb_convert_encoding')) {
    function mb_convert_encoding($str, $to_encoding, $from_encoding = NULL) {
        return iconv(($from_encoding === NULL) ? mb_internal_encoding() : $from_encoding, $to_encoding, $str);
    }
}

if (!function_exists('mb_chr')) {
    function mb_chr($ord, $encoding = 'UTF-8') {
        if ($encoding === 'UCS-4BE') {
            return pack("N", $ord);
        } else {
            return mb_convert_encoding(mb_chr($ord, 'UCS-4BE'), $encoding, 'UCS-4BE');
        }
    }
}

if (!function_exists('mb_ord')) {
    function mb_ord($char, $encoding = 'UTF-8') {
        if ($encoding === 'UCS-4BE') {
            list(, $ord) = (strlen($char) === 4) ? @unpack('N', $char) : @unpack('n', $char);
            return $ord;
        } else {
            return mb_ord(mb_convert_encoding($char, 'UCS-4BE', $encoding), 'UCS-4BE');
        }
    }
}

演示

echo "\nGet string from numeric DEC value\n";
var_dump(mb_chr(25105));
var_dump(mb_chr(22909));

echo "\nGet string from numeric HEX value\n";
var_dump(mb_chr(0x6211));
var_dump(mb_chr(0x597D));

echo "\nGet numeric value of character as DEC int\n";
var_dump(mb_ord('我'));
var_dump(mb_ord('好'));

echo "\nGet numeric value of character as HEX string\n";
var_dump(dechex(mb_ord('我')));
var_dump(dechex(mb_ord('好')));

输出:

Get string from numeric DEC value
string(3) "我"
string(3) "好"

Get string from numeric HEX value
string(3) "我"
string(3) "好"

Get numeric value of character as DEC string
int(25105)
int(22909)

Get numeric value of character as HEX string
string(4) "6211"
string(4) "597d"

答案 1 :(得分:0)

看看这是否可以帮助您:hex2ascii and ascii2hex

于2012年9月19日增加:

function ascii2hex($ascii)
{
    $hex = '';
    for ($i = 0; $i < strlen($ascii); $i++)
    {
        $byte = strtoupper(dechex(ord($ascii{$i})));
        $byte = str_repeat('0', 2 - strlen($byte)).$byte;
        $hex .= $byte." ";
    }
    return $hex;
}

function hex2ascii($hex)
{
    $ascii = '';
    $hex = str_replace(" ", "", $hex);
    for($i = 0; $i < strlen($hex); $i = $i+2)
        $ascii .= chr(hexdec(substr($hex, $i, 2)));

    return($ascii);
}