如何获得"渲染长度"包含PHP中组合字符的Unicode字符串?

时间:2015-04-12 11:50:03

标签: php string unicode localization internationalization

考虑到并非所有unicode组合字符都具有等效的预组合字符(NFC),是否有办法使用PHP获取字符串的“渲染”长度,如果可能/使语义有意义?

http://3v4l.org/L1kPl(使用php7转义语法)

<?php
echo $s = "\u{0071}\u{0307}\u{0323}";
echo "\n";
echo mb_strlen(Normalizer::normalize($s, Normalizer::FORM_C), "UTF-8");
// Shows 3 because there is no precomposed equivalent
// for such glyph. I want to get 1 instead

到目前为止我取得的成就:http://3v4l.org/4NSCi

<?php
echo $s = "\u{0071}\u{0307}\u{0323}";
$r = Normalizer::normalize($s, Normalizer::FORM_C);
echo mb_strlen(preg_replace("@\p{Mn}@u", "", $r), "UTF-8");

1 个答案:

答案 0 :(得分:3)

您可能正在寻找:

grapheme_strlen()

需要一个有效的utf-8字符串。 以下是Graphme cluster boundaries

的参考资料