php 7 mb_(多字节)函数比5.3(仅限windows)更慢约60%

时间:2017-07-11 07:26:21

标签: php performance php-7 php-7.1

我的应用程序广泛使用mb_字符串函数,切换到php 7导致整体应用程序速度变慢。我将问题跟踪到mb_字符串函数。以下是基准代码和结果:

$time = microtime();
$time = explode(' ', $time);
$start = $time[1] + $time[0];
$startms = $time[0];
    for ($i=0; $i<100000; $i++) {
        $a = mb_strlen("fdsfdssdfoifjosdifjosdifjosdij:ά", "UTF-8");
    }
$time = microtime();
$time = explode(' ', $time);
$finish = $time[1] + $time[0];
$finishms = $time[0];
$total_time = round(($finish - $start), 4);
echo "mb_strlen: " . $total_time*1000 ." milliseconds<br/>";

$time = microtime();
$time = explode(' ', $time);
$start = $time[1] + $time[0];
$startms = $time[0];
    for ($i=0; $i<100000; $i++) {
        $a = mb_stripos("fdsfdssdfoifjosdifjosdifjosdij:ά", "α", 0, "UTF-8");
    }
$time = microtime();
$time = explode(' ', $time);
$finish = $time[1] + $time[0];
$finishms = $time[0];
$total_time = round(($finish - $start), 4);
echo "mb_stripos: " . $total_time*1000 ." milliseconds<br/>";


$time = microtime();
$time = explode(' ', $time);
$start = $time[1] + $time[0];
$startms = $time[0];
    for ($i=0; $i<100000; $i++) {
        $a = mb_substr("fdsfdssdfoifjosdifjosdifjosdij:ά", $i, 1, "UTF-8");
    }
$time = microtime();
$time = explode(' ', $time);
$finish = $time[1] + $time[0];
$finishms = $time[0];
$total_time = round(($finish - $start), 4);
echo "mb_substr: " . $total_time*1000 ." milliseconds<br/>";

该平台是Windows 7 64位,IIS 7.5:

php 5.3.28
mb_strlen: 250 milliseconds
mb_stripos: 3078.1 milliseconds
mb_substr: 281.3 milliseconds

php 7.1.1
mb_strlen: 406.3 milliseconds
mb_stripos: 4796.9 milliseconds
mb_substr: 421.9 milliseconds

我不知道我的设置是错还是什么,但似乎不可思议的是多字节功能应该更慢。关于为什么以及如何解决这个问题的任何想法?提前谢谢。

编辑:as apokryfos&#39;评论建议,这可能是Windows唯一的问题。

2 个答案:

答案 0 :(得分:4)

我可以确认您的结果在Windows 7上是可重现的。 经过一些实验,我找到了一个快速解决方案,即IMO甚至不应该产生影响。

mb_strlen()功能签名中可以看到, 如果省略encoding参数,它将使用内部编码。 这也适用于您使用的其他功能。

mixed mb_strlen ( string $str [, string $encoding = mb_internal_encoding() ] )

我发现奇怪的是,如果通过调用mb_internal_encoding("UTF-8")将内部编码设置为UTF-8并省略编码参数, 功能变得更快。

PHP 5.5结果:

5.5.12

with encoding parameter:
- mb_strlen: 172 ms, result: 5
- mb_substr: 218 ms, result: う
- mb_strpos: 218 ms, result: 3
- mb_stripos: 1,669 ms, result: 3
- mb_strrpos: 234 ms, result: 3
- mb_strripos: 1,685 ms, result: 3

with internal encoding:
- mb_strlen: 47 ms, result: 5
- mb_substr: 78 ms, result: う
- mb_strpos: 62 ms, result: 3
- mb_stripos: 1,669 ms, result: 3
- mb_strrpos: 94 ms, result: 3
- mb_strripos: 1,669 ms, result: 3

PHP 7.0结果:

7.0.12

with encoding parameter:
- mb_strlen: 640 ms, result: 5
- mb_substr: 702 ms, result: う
- mb_strpos: 686 ms, result: 3
- mb_stripos: 7,067 ms, result: 3
- mb_strrpos: 749 ms, result: 3
- mb_strripos: 7,130 ms, result: 3

with internal encoding:
- mb_strlen: 31 ms, result: 5
- mb_substr: 31 ms, result: う
- mb_strpos: 47 ms, result: 3
- mb_stripos: 7,270 ms, result: 3
- mb_strrpos: 62 ms, result: 3
- mb_strripos: 7,116 ms, result: 3

不幸的是,这个快速解决方案并不完美,mb_stripos()mb_strripos()似乎不会受到影响。 他们仍然很慢。

这是代码(缩短):

echo PHP_VERSION."\n";
echo "\nwith encoding parameter:\n";

$t = microtime(true)*1000;
for($i=0; $i<100000; $i++){
    $n = mb_strlen("あえいおう","UTF-8");
}
$t = microtime(true)*1000-$t;
echo "- mb_strlen: ".number_format($t)." ms, result: {$n}\n";

$t = microtime(true)*1000;
for($i=0; $i<100000; $i++){
    $n = mb_substr("あえいおう",-1,1,"UTF-8");
}
$t = microtime(true)*1000-$t;
echo "- mb_substr: ".number_format($t)." ms, result: {$n}\n";

//set internal encoding
//and omit encoding parameter

mb_internal_encoding("UTF-8");
echo "\nwith internal encoding:\n";

$t = microtime(true)*1000;
for($i=0; $i<100000; $i++){
    $n = mb_strlen("あえいおう");
}
$t = microtime(true)*1000-$t;
echo "- mb_strlen: ".number_format($t)." ms, result: {$n}\n";

$t = microtime(true)*1000;
for($i=0; $i<100000; $i++){
    $n = mb_substr("あえいおう",-1,1);
}
$t = microtime(true)*1000-$t;
echo "- mb_substr: ".number_format($t)." ms, result: {$n}\n";

答案 1 :(得分:3)

这听起来像是一个&#34;性能回归&#34;错误。应该提交一个bug报告,所以php核心开发人员可以在bugs.php.net看看它。

同时,我发现在你的片段中你只使用UTF-8。只要你专门使用UTF-8,你就可以使用preg_来加速它,它只支持1种unicode characterset:UTF-8。这是我的尝试:

function _mb_strlen(string $str, string $encoding = 'UTF-8'): int {
    assert ( $encoding === 'UTF-8' );
    preg_match ( '/.$/u', $str, $matches, PREG_OFFSET_CAPTURE );
    return empty ( $matches ) ? 0 : ($matches [0] [1]) + 1;
}
function _mb_stripos(string $haystack, string $needle, int $offset = 0, string $encoding = 'UTF-8') {
    assert ( $encoding === 'UTF-8' );
    if ($offset !== 0) {
        throw new LogicException ( 'NOT IMPLEMENTED' );
    }
    preg_match ( '/' . preg_quote ( $needle ) . '/ui', $haystack, $matches, PREG_OFFSET_CAPTURE );
    return empty ( $matches ) ? false : $matches [0] [1];
}
function _mb_substr(string $str, int $start, int $length = NULL, string $encoding = 'UTF-8'): string {
    assert ( $encoding === 'UTF-8' );
    if ($start < 0) {
        throw new LogicException ( 'NOT IMPLEMENTED' );
    } elseif ($start > 0) {
        $rex = '/.{' . $start . '}(.{0,';
    } else {
        $rex = '/(.{0,';
    }
    if ($length !== NULL) {
        $rex .= $length;
    }
    $rex .= '})/u';
    preg_match ( $rex, $str, $matches );
    // var_dump ( $rex, $matches );
    return empty ( $matches ) ? '' : $matches [1];
}

这是我在debian 9 linux(内核4.9)上对php 7.0进行100,000次迭代的基准测试结果:

mb_strlen变慢,从大约60毫秒到100毫秒

mb_stripos得到了更快,从大约1400毫秒到75毫秒

mb_substr得到了很多,从大约47毫秒到大约800毫秒

  • 但我建议你在Windows上重新运行这些测试,正如你所说的那样,你认为它可能是一个Windows专属问题

另请注意,这些功能并不完整,正如您可以从他们抛出的LogicException中看到的那样。

还要注意,由于preg_的限制,我不得不将mb_substr限制为65000次迭代

for($i = 0; $i < 65000; $i ++) {
    $a = mb_substr ( "fdsfdssdfoifjosdifjosdifjosdij:ά", $i, 1, "UTF-8" );
}

因为,如果你要求preg查找超过65,000个字符的字符串,则会出错...

另请注意,您的基准代码可以轻松完成,所有这些

$time = microtime();
$time = explode(' ', $time);
$start = $time[1] + $time[0];
$startms = $time[0];
    for ($i=0; $i<100000; $i++) {
        $a = mb_strlen("fdsfdssdfoifjosdifjosdifjosdij:ά", "UTF-8");
    }
$time = microtime();
$time = explode(' ', $time);
$finish = $time[1] + $time[0];
$finishms = $time[0];
$total_time = round(($finish - $start), 4);
echo "mb_strlen: " . $total_time*1000 ." milliseconds<br/>";

可以简单地替换为

$starttime=microtime(true);
    for ($i=0; $i<100000; $i++) {
        $a = mb_strlen("fdsfdssdfoifjosdifjosdifjosdij:ά", "UTF-8");
    }
$endtime=microtime(true);
echo "mb_strlen: " . number_format(($endtime-$starttime),3) ." seconds<br/>";

输出类似于:mb_strlen: 0.085 seconds的内容 (这意味着大约85毫秒)

echo "mb_strlen: " . number_format(($endtime - $starttime) * 1000),2) . " milliseconds<br/>";

(我可以猜测它与realloc()性能有关,其中linux踩踏窗口,但我没有证明)