我的应用程序广泛使用mb_字符串函数,切换到php 7导致整体应用程序速度变慢。我将问题跟踪到mb_字符串函数。以下是基准代码和结果:
$time = microtime();
$time = explode(' ', $time);
$start = $time[1] + $time[0];
$startms = $time[0];
for ($i=0; $i<100000; $i++) {
$a = mb_strlen("fdsfdssdfoifjosdifjosdifjosdij:ά", "UTF-8");
}
$time = microtime();
$time = explode(' ', $time);
$finish = $time[1] + $time[0];
$finishms = $time[0];
$total_time = round(($finish - $start), 4);
echo "mb_strlen: " . $total_time*1000 ." milliseconds<br/>";
$time = microtime();
$time = explode(' ', $time);
$start = $time[1] + $time[0];
$startms = $time[0];
for ($i=0; $i<100000; $i++) {
$a = mb_stripos("fdsfdssdfoifjosdifjosdifjosdij:ά", "α", 0, "UTF-8");
}
$time = microtime();
$time = explode(' ', $time);
$finish = $time[1] + $time[0];
$finishms = $time[0];
$total_time = round(($finish - $start), 4);
echo "mb_stripos: " . $total_time*1000 ." milliseconds<br/>";
$time = microtime();
$time = explode(' ', $time);
$start = $time[1] + $time[0];
$startms = $time[0];
for ($i=0; $i<100000; $i++) {
$a = mb_substr("fdsfdssdfoifjosdifjosdifjosdij:ά", $i, 1, "UTF-8");
}
$time = microtime();
$time = explode(' ', $time);
$finish = $time[1] + $time[0];
$finishms = $time[0];
$total_time = round(($finish - $start), 4);
echo "mb_substr: " . $total_time*1000 ." milliseconds<br/>";
该平台是Windows 7 64位,IIS 7.5:
php 5.3.28
mb_strlen: 250 milliseconds
mb_stripos: 3078.1 milliseconds
mb_substr: 281.3 milliseconds
php 7.1.1
mb_strlen: 406.3 milliseconds
mb_stripos: 4796.9 milliseconds
mb_substr: 421.9 milliseconds
我不知道我的设置是错还是什么,但似乎不可思议的是多字节功能应该更慢。关于为什么以及如何解决这个问题的任何想法?提前谢谢。
编辑:as apokryfos&#39;评论建议,这可能是Windows唯一的问题。
答案 0 :(得分:4)
我可以确认您的结果在Windows 7上是可重现的。 经过一些实验,我找到了一个快速解决方案,即IMO甚至不应该产生影响。
从mb_strlen()功能签名中可以看到, 如果省略encoding参数,它将使用内部编码。 这也适用于您使用的其他功能。
mixed mb_strlen ( string $str [, string $encoding = mb_internal_encoding() ] )
我发现奇怪的是,如果通过调用mb_internal_encoding("UTF-8")
将内部编码设置为UTF-8并省略编码参数,
功能变得更快。
PHP 5.5结果:
5.5.12
with encoding parameter:
- mb_strlen: 172 ms, result: 5
- mb_substr: 218 ms, result: う
- mb_strpos: 218 ms, result: 3
- mb_stripos: 1,669 ms, result: 3
- mb_strrpos: 234 ms, result: 3
- mb_strripos: 1,685 ms, result: 3
with internal encoding:
- mb_strlen: 47 ms, result: 5
- mb_substr: 78 ms, result: う
- mb_strpos: 62 ms, result: 3
- mb_stripos: 1,669 ms, result: 3
- mb_strrpos: 94 ms, result: 3
- mb_strripos: 1,669 ms, result: 3
PHP 7.0结果:
7.0.12
with encoding parameter:
- mb_strlen: 640 ms, result: 5
- mb_substr: 702 ms, result: う
- mb_strpos: 686 ms, result: 3
- mb_stripos: 7,067 ms, result: 3
- mb_strrpos: 749 ms, result: 3
- mb_strripos: 7,130 ms, result: 3
with internal encoding:
- mb_strlen: 31 ms, result: 5
- mb_substr: 31 ms, result: う
- mb_strpos: 47 ms, result: 3
- mb_stripos: 7,270 ms, result: 3
- mb_strrpos: 62 ms, result: 3
- mb_strripos: 7,116 ms, result: 3
不幸的是,这个快速解决方案并不完美,mb_stripos()
和mb_strripos()
似乎不会受到影响。
他们仍然很慢。
这是代码(缩短):
echo PHP_VERSION."\n";
echo "\nwith encoding parameter:\n";
$t = microtime(true)*1000;
for($i=0; $i<100000; $i++){
$n = mb_strlen("あえいおう","UTF-8");
}
$t = microtime(true)*1000-$t;
echo "- mb_strlen: ".number_format($t)." ms, result: {$n}\n";
$t = microtime(true)*1000;
for($i=0; $i<100000; $i++){
$n = mb_substr("あえいおう",-1,1,"UTF-8");
}
$t = microtime(true)*1000-$t;
echo "- mb_substr: ".number_format($t)." ms, result: {$n}\n";
//set internal encoding
//and omit encoding parameter
mb_internal_encoding("UTF-8");
echo "\nwith internal encoding:\n";
$t = microtime(true)*1000;
for($i=0; $i<100000; $i++){
$n = mb_strlen("あえいおう");
}
$t = microtime(true)*1000-$t;
echo "- mb_strlen: ".number_format($t)." ms, result: {$n}\n";
$t = microtime(true)*1000;
for($i=0; $i<100000; $i++){
$n = mb_substr("あえいおう",-1,1);
}
$t = microtime(true)*1000-$t;
echo "- mb_substr: ".number_format($t)." ms, result: {$n}\n";
答案 1 :(得分:3)
这听起来像是一个&#34;性能回归&#34;错误。应该提交一个bug报告,所以php核心开发人员可以在bugs.php.net看看它。
同时,我发现在你的片段中你只使用UTF-8。只要你专门使用UTF-8,你就可以使用preg_来加速它,它只支持1种unicode characterset:UTF-8
。这是我的尝试:
function _mb_strlen(string $str, string $encoding = 'UTF-8'): int {
assert ( $encoding === 'UTF-8' );
preg_match ( '/.$/u', $str, $matches, PREG_OFFSET_CAPTURE );
return empty ( $matches ) ? 0 : ($matches [0] [1]) + 1;
}
function _mb_stripos(string $haystack, string $needle, int $offset = 0, string $encoding = 'UTF-8') {
assert ( $encoding === 'UTF-8' );
if ($offset !== 0) {
throw new LogicException ( 'NOT IMPLEMENTED' );
}
preg_match ( '/' . preg_quote ( $needle ) . '/ui', $haystack, $matches, PREG_OFFSET_CAPTURE );
return empty ( $matches ) ? false : $matches [0] [1];
}
function _mb_substr(string $str, int $start, int $length = NULL, string $encoding = 'UTF-8'): string {
assert ( $encoding === 'UTF-8' );
if ($start < 0) {
throw new LogicException ( 'NOT IMPLEMENTED' );
} elseif ($start > 0) {
$rex = '/.{' . $start . '}(.{0,';
} else {
$rex = '/(.{0,';
}
if ($length !== NULL) {
$rex .= $length;
}
$rex .= '})/u';
preg_match ( $rex, $str, $matches );
// var_dump ( $rex, $matches );
return empty ( $matches ) ? '' : $matches [1];
}
这是我在debian 9 linux(内核4.9)上对php 7.0进行100,000次迭代的基准测试结果:
mb_strlen变慢,从大约60毫秒到100毫秒
mb_stripos得到了更快,从大约1400毫秒到75毫秒
mb_substr得到了很多,从大约47毫秒到大约800毫秒
另请注意,这些功能并不完整,正如您可以从他们抛出的LogicException中看到的那样。
还要注意,由于preg_的限制,我不得不将mb_substr限制为65000次迭代
for($i = 0; $i < 65000; $i ++) {
$a = mb_substr ( "fdsfdssdfoifjosdifjosdifjosdij:ά", $i, 1, "UTF-8" );
}
因为,如果你要求preg查找超过65,000个字符的字符串,则会出错...
另请注意,您的基准代码可以轻松完成,所有这些
$time = microtime();
$time = explode(' ', $time);
$start = $time[1] + $time[0];
$startms = $time[0];
for ($i=0; $i<100000; $i++) {
$a = mb_strlen("fdsfdssdfoifjosdifjosdifjosdij:ά", "UTF-8");
}
$time = microtime();
$time = explode(' ', $time);
$finish = $time[1] + $time[0];
$finishms = $time[0];
$total_time = round(($finish - $start), 4);
echo "mb_strlen: " . $total_time*1000 ." milliseconds<br/>";
可以简单地替换为
$starttime=microtime(true);
for ($i=0; $i<100000; $i++) {
$a = mb_strlen("fdsfdssdfoifjosdifjosdifjosdij:ά", "UTF-8");
}
$endtime=microtime(true);
echo "mb_strlen: " . number_format(($endtime-$starttime),3) ." seconds<br/>";
输出类似于:mb_strlen: 0.085 seconds
的内容
(这意味着大约85毫秒)
或
echo "mb_strlen: " . number_format(($endtime - $starttime) * 1000),2) . " milliseconds<br/>";
(我可以猜测它与realloc()性能有关,其中linux踩踏窗口,但我没有证明)