在preg_replace()之前使用strpos()更快吗?

时间:2015-04-19 07:16:40

标签: php regex preg-replace strpos

假设我们在数百万个帖子字符串中使用此preg_replace

function makeClickableLinks($s) {
    return preg_replace('@(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)@', '<a href="$1" target="_blank">$1</a>', $s);
}

假设所有帖子中只有10%包含链接,在致电strpos($string, 'http') !== false之前检查preg_replace()会更快吗?如果是这样,为什么? preg_replace()内部没有执行一些预测试吗?

2 个答案:

答案 0 :(得分:4)

令人惊讶的是,是的!

以下是基于这两个函数分析10,000,000个字符串的基准:

测试1 - 与模式匹配的字符串:

"Here is a great new site to visit at http://example.com so go there now!"
  

preg_replace单独花了10.9626309872秒
  preg_replace之前的strpos需要12.6124269962秒←较慢

测试2 - 与模式不匹配的字符串:

"Here is a great new site to visit at ftp://example.com so go there now!"
  

preg_replace单独花了6.51636195183秒
  preg_replace之前的strpos需要2.91205692291秒←更快

测试3 - 10%的字符串符合模式:

"Here is a great new site to visit at ftp://example.com so go there now!" (90%)
"Here is a great new site to visit at http://example.com so go there now!" (10%)
  

preg_replace单独花了7.43295097351秒
  preg_replace之前的strpos需要4.31978201866秒←更快

它只是两个字符串的简单基准,但速度有明显差异。


以下是&#34; 10%&#34;的测试安全带。情况下:

<?php
$string1 = "Here is a great new site to visit at http://example.com so go there now!";
$string2 = "Here is a great new site to visit at ftp://example.com so go there now!";

function makeClickableLinks1($s) {
    return preg_replace('@(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)@', '<a href="$1" target="_blank">$1</a>', $s);
}

function makeClickableLinks2($s) {
    return strpos($s, 'http') !== false ? preg_replace('@(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)@', '<a href="$1" target="_blank">$1</a>', $s) : null;
}

/* Begin test harness */

$loops = 10000000;

function microtime_float() {
    list($usec, $sec) = explode(" ", microtime());
    return ((float)$usec + (float)$sec);
}

/* Test using only preg_replace */

$time_start = microtime_float();
for($i = 0; $i < $loops; $i++) {
    // Only 10% of strings will have "http"
    makeClickableLinks1($i % 10 ? $string2 : $string1);
}
$time_end = microtime_float();
$time = $time_end - $time_start;
echo "preg_replace alone took $time seconds<br/>";

/* Test using strpos before preg_replace */

$time_start = microtime_float();
for($i = 0; $i < $loops; $i++) {
    // Only 10% of strings will have "http"
    makeClickableLinks2($i % 10 ? $string2 : $string1);
}
$time_end = microtime_float();
$time = $time_end - $time_start;
echo "strpos before preg_replace took $time seconds<br/>";
?>

答案 1 :(得分:-1)

是的,使用像strpos()之类的简单搜索比编译和执行正则表达式要快得多,除了必须为替换本身进行的内存复制之外。如果你做了数百或数千,那么没有意义,但如果你做了数百万(特别是如果只有10%包含http),那么首先做一个简单的搜索是值得的。

最终,100%确定的唯一方法是对其进行基准测试,但我相当肯定您将首先使用strpos()获得一些改进。