PHP Multi Byte str_replace?

时间:2009-09-20 14:29:32

标签: php string replace multibyte-functions

我正在尝试在PHP中进行重音字符替换但得到时髦的结果,我的猜测是因为我使用的是UTF-8字符串而str_replace无法正确处理多字节字符串..

$accents_search     = array('á','à','â','ã','ª','ä','å','Á','À','Â','Ã','Ä','é','è',
'ê','ë','É','È','Ê','Ë','í','ì','î','ï','Í','Ì','Î','Ï','œ','ò','ó','ô','õ','º','ø',
'Ø','Ó','Ò','Ô','Õ','ú','ù','û','Ú','Ù','Û','ç','Ç','Ñ','ñ'); 

$accents_replace    = array('a','a','a','a','a','a','a','A','A','A','A','A','e','e',
'e','e','E','E','E','E','i','i','i','i','I','I','I','I','oe','o','o','o','o','o','o',
'O','O','O','O','O','u','u','u','U','U','U','c','C','N','n'); 

$str = str_replace($accents_search, $accents_replace, $str);

我得到的结果:

Ørjan Nilsen -> �orjan Nilsen

预期结果:

Ørjan Nilsen -> Orjan Nilsen

编辑:我的内部字符处理程序设置为UTF-8(根据mb_internal_encoding()),$ str的值也是UTF-8,所以从我所知道的,所涉及的所有字符串都是UTF -8。 str_replace()是否检测到char集并正确使用它们?

4 个答案:

答案 0 :(得分:18)

根据php文档str_replace,函数是二进制安全的,这意味着它可以处理UTF-8编码的文本,而不会丢失任何数据。

答案 1 :(得分:5)

看起来字符串未被替换,因为您的输入编码和文件编码不匹配。

答案 2 :(得分:3)

可以使用Unicode normalization form D(NFD)和Unicode字符属性删除变音符号。

NFD将“ü”变音符号从“LATIN SMALL LETTER U WITH DIAERESIS”(这是一封信)转换为“LATIN SMALL LETTER U”(字母)和“COMBINING DIAERESIS”(不是字母)。

header('Content-Type: text/plain; charset=utf-8');

$test = implode('', array('á','à','â','ã','ª','ä','å','Á','À','Â','Ã','Ä','é','è',
'ê','ë','É','È','Ê','Ë','í','ì','î','ï','Í','Ì','Î','Ï','œ','ò','ó','ô','õ','º','ø',
'Ø','Ó','Ò','Ô','Õ','ú','ù','û','Ú','Ù','Û','ç','Ç','Ñ','ñ'));

$test = Normalizer::normalize($test, Normalizer::FORM_D);

// Remove everything that's not a "letter" or a space (e.g. diacritics)
// (see http://de2.php.net/manual/en/regexp.reference.unicode.php)
$pattern = '/[^\pL ]/u';

echo preg_replace($pattern, '', $test);

输出:

aaaaªaaAAAAAeeeeEEEEiiiiIIIIœooooºøØOOOOuuuUUUcCNn

Normalizer类是PECL intl package的一部分。 (算法本身并不复杂,但需要加载大量的字符映射。我刚才写了PHP implementation。)

(我迟到了两个月,因为我觉得这是一个不太广为人知的好技术。)

答案 3 :(得分:2)

尝试此功能定义:

if (!function_exists('mb_str_replace')) {
    function mb_str_replace($search, $replace, $subject) {
        if (is_array($subject)) {
            foreach ($subject as $key => $val) {
                $subject[$key] = mb_str_replace((string)$search, $replace, $subject[$key]);
            }
            return $subject;
        }
        $pattern = '/(?:'.implode('|', array_map(create_function('$match', 'return preg_quote($match[0], "/");'), (array)$search)).')/u';
        if (is_array($search)) {
            if (is_array($replace)) {
                $len = min(count($search), count($replace));
                $table = array_combine(array_slice($search, 0, $len), array_slice($replace, 0, $len));
                $f = create_function('$match', '$table = '.var_export($table, true).'; return array_key_exists($match[0], $table) ? $table[$match[0]] : $match[0];');
                $subject = preg_replace_callback($pattern, $f, $subject);
                return $subject;
            }
        }
        $subject = preg_replace($pattern, (string)$replace, $subject);
        return $subject;
    }
}