Question

对于我的一个应用程序，我假设比较2个字符串的第一个字符比比较整个字符串的相等性要快。例如，如果我知道只有2个可能的字符串（在一组 n 字符串中）可以以相同的字母开头（让我们说＆＃39; q＆＃39;），如果是这样，它们是相同的字符串，那么我可能会写一个这样的比较：

if ($stringOne[0] === $stringTwo[0]) $qString = true;

而不是：

if ($stringOne === $stringTwo) $qString = true;

但我最近写了一些基准脚本，看起来我错了。也就是说，第二次比较看起来平均比第二次快2-4倍。我的基准看起来像这样：

$x = 'A really really looooooooooooong string';
$y = 'A really really looooooooooooong string';

$timeArray = array();

//Method 1, two-four times faster than Method 2
for($i = 0; $i < 100; $i++) {
    $t1 = microtime(true);
    for($j = 0; $j < 100000; $j++) {
        if ($x === $y) continue;
    }
    $t2 = microtime(true);
    $timeArray[] = $t2 - $t1;
}

echo array_sum($timeArray) / 100;//average time is echoed

//Method 2
for($i = 0; $i < 100; $i++) {
    $t1 = microtime(true);
    for($j = 0; $j < 100000; $j++) {
        if ($x[0] === $y[0]) continue;
    }
    $t2 = microtime(true);
    $timeArray[] = $t2 - $t1;
}

echo array_sum($timeArray) / 100;//average time is echoed

我想我假设因为每个字符串$ x和$ y都在内存中，所以每个字符串的第一个字符也在内存中，并且比较会更快。

为什么整个字符串比较更快？是否有成本＆＃34;从每个字符串中提取第一个字符以进行比较？

更新：即使在每个外部循环迭代中生成新字符串并进行比较，或者起始字符串相同或不相同，Method1仍然比Method2更快。

//Method 1 faster than Method 2 by 2-3 times
for($i = 0; $i < 100; $i++) {
    $t1 = microtime(true);
    $a = $x . $i;
    $b = $y . $i;
    for($j = 0; $j < 100000; $j++) {
        if ($a === $b) continue;
    }
    $t2 = microtime(true);
    $timeArray[] = $t2 - $t1;
}

//Method 2
for($i = 0; $i < 100; $i++) {
    $t1 = microtime(true);
    $a = $x . $i;
    $b = $y . $i;
    for($j = 0; $j < 100000; $j++) {
        if ($a[0] === $b[0]) continue;
    }
    $t2 = microtime(true);
    $timeArray[] = $t2 - $t1;
}

如果通过严格的不等价而不是严格的等价来比较两者，也会得到相同的结果

//Method 1 faster than Method 2 by 1.5-2 times, but now less of a difference
for($i = 0; $i < 100; $i++) {
    $t1 = microtime(true);
    $a = $x . $i;
    $b = $y . $i;
    for($j = 0; $j < 100000; $j++) {
        if ($a !== $b) continue; // using inequivalence this time
    }
    $t2 = microtime(true);
    $timeArray[] = $t2 - $t1;
}

//Method 2
for($i = 0; $i < 100; $i++) {
    $t1 = microtime(true);
    $a = $x . $i;
    $b = $y . $i;
    for($j = 0; $j < 100000; $j++) {
        if ($a[0] !== $b[0]) continue;  // using inequivalence this time
    }
    $t2 = microtime(true);
    $timeArray[] = $t2 - $t1;
}

Answer 1

静态字符串，与脚本中的will be interned一样（请参阅维基百科上的String Interning以获取有关含义的详细说明。）

基本上这意味着相同的字符串只会在内存中存储一次。当PHP进行比较时，它会立即看到两个字符串都引用内存中的同一个对象，而不需要进行任何进一步的检查。字符串中单个字符的比较很可能不会从这种优化中受益，这就是为什么它们可能需要更长的时间。

其他因素也很有可能发挥作用，但这将是一个重要因素。尝试动态构建一个或两个字符串，并通过更改代码来查看结果的变化：

$x = base64_encode(base64_decode('A really really looooooooooooong string'));

正如评论中所承诺的，这是一个脚本版本，它覆盖了字符串实习和可能正在使用的任何类型的相等缓存。

我在这里得到的结果表明第二种方法的速度要快一些。

<?php
$runs = 1000000;
$input_string_a = "A really really looooooooooooong string";
$input_string_b = "B really really looooooooooooong string";

$total_time = 0;
for($i=0; $i<$runs; $i++) {
    $a = substr($input_string_a, 0);
    $b = substr($input_string_b, 0);
    $start = microtime(true);
    if($a === $b) {
        if(false) break;
    }
    $end = microtime(true);
    $total_time += $end - $start;
}

echo $total_time."\n";

$total_time = 0;
for($i=0; $i<$runs; $i++) {
    $a = substr($input_string_a, 0);
    $b = substr($input_string_b, 0);
    $start = microtime(true);
    if($a[0] === $b[0]) {
        if(false) break;
    }
    $end = microtime(true);
    $total_time += $end - $start;
}

echo $total_time."\n";

PHP - 为什么2个完整长（相同）字符串的比较比每个字符串的第一个字符的比较快？

1 个答案: