Question

我发现在使用UTF-8字符串时，使用u修饰符有时会有所帮助，但在我的Linux服务器上，它会将变音符号替换为-，而不是像我的Windows服务器一样。

mb_internal_encoding('UTF-8');
function clean($string) {
    return preg_replace('/[^[:alnum:]]/ui', '-', $string);
}
echo clean("Test: föG");

Linux中： Test--f-G

Windows（应该如此）： Test--föG

Answer 1

来自PHP documentation of the PCRE module：

在UTF-8模式下，值大于128的字符与任何POSIX字符类都不匹配。

这可能是因为效率原因：许多 Unicode字符。您可以使用Unicode字符属性而不是POSIX字符类来编写正则表达式。但这会慢一些。

<?php
mb_internal_encoding('UTF-8');
function clean($string) {
        return preg_replace('/[^\\p{L}\\p{N}]/ui', '-', $string);
}
echo clean("Test: föG");

preg_replace with：alnum：和UTF-8

1 个答案: