Question

我正在尝试在PHP中编写一个函数，它接受一个字符串数组（needle）并对另一个字符串数组（haystack）进行比较。此函数的目的是为AJAX搜索快速提供匹配的字符串，因此它需要尽可能快。

这里有一些示例代码来说明两个数组;

$needle = array('ba','hot','resta');

$haystack = array(
    'Southern Hotel',
    'Grange Restaurant & Hotel',
    'Austral Hotel',
    'Barsmith Hotel',
    'Errestas'
);

虽然这本身很容易，但比较的目的是计算needle中出现haystack个字符串的数量。

但是，有三个限制因素;

比较不区分大小写
needle必须只匹配的字符这个词的开头。例如， “hote”将匹配“Hotel”，但“resta” 不符合“Errestas”。
我们想要计算匹配needles的数量，而不是needle出现次数。如果某个地方被命名为“酒店酒店酒店”，我们需要结果为1而不是3。

使用上面的例子，我们期望得到以下关联数组：

$haystack = array(
    'Southern Hotel' => 1,
    'Grange Restaurant & Hotel' => 2,
    'Austral Hotel' => 1,
    'Barsmith Hotel' => 2,
    'Erresta'  => 0
);

我一直在尝试使用preg_match_all()和看起来像/(\A|\s)(ba|hot|resta)/的正则表达式来实现一个函数来执行此操作。虽然这可以确保我们只匹配单词的开头，但它不会考虑包含相同needle两次的字符串。

我发帖是为了看其他人是否有解决方案？

Answer 1

我发现你对问题的描述足够详细，我可以采用TDD方法来解决它。因此，因为我非常想成为一名TDD人，所以我编写了测试和函数以使测试通过。 Namings可能并不完美，但它们很容易改变。函数的算法也可能不是最好的，但是现在有了测试，重构应该非常容易和无痛。

以下是测试：

class MultiMatcherTest extends PHPUnit_Framework_TestCase
{
    public function testTheComparisonIsCaseInsensitive()
    {
        $needles  = array('hot');
        $haystack = array('Southern Hotel');
        $result   = match($needles, $haystack);

        $this->assertEquals(array('Southern Hotel' => 1), $result);
    }

    public function testNeedleMatchesOnlyCharsAtBeginningOfWord()
    {
        $needles  = array('resta');
        $haystack = array('Errestas');
        $result   = match($needles, $haystack);

        $this->assertEquals(array('Errestas' => 0), $result);
    }

    public function testMatcherCountsNeedlesNotOccurences()
    {
        $needles  = array('hot');
        $haystack = array('Southern Hotel', 'Grange Restaurant & Hotel');
        $expected = array('Southern Hotel'            => 1,
                          'Grange Restaurant & Hotel' => 1);
        $result   = match($needles, $haystack);

        $this->assertEquals($expected, $result);
    }

    public function testAcceptance()
    {
        $needles  = array('ba','hot','resta');
        $haystack = array(
            'Southern Hotel',
            'Grange Restaurant & Hotel',
            'Austral Hotel',
            'Barsmith Hotel',
            'Errestas',
        );
        $expected = array(
            'Southern Hotel'            => 1,
            'Grange Restaurant & Hotel' => 2,
            'Austral Hotel'             => 1,
            'Barsmith Hotel'            => 2,
            'Errestas'                  => 0,
        );

        $result = match($needles, $haystack);

        $this->assertEquals($expected, $result);
    }
}

这是函数：

function match($needles, $haystack)
{
    // The default result will containg 0 (zero) occurences for all $haystacks
    $result = array_combine($haystack, array_fill(0, count($haystack), 0));

    foreach ($needles as $needle) {

        foreach ($haystack as $subject) {
            $words = str_word_count($subject, 1); // split into words

            foreach ($words as $word) {
                if (stripos($word, $needle) === 0) {
                    $result[$subject]++;

                    break;
                }
            }
        }
    }

    return $result;
}

测试`break`语句是否必要

以下测试显示何时需要break。在break函数中使用和不使用match语句运行此测试。

/**
 * This test demonstrates the purpose of the BREAK statement in the
 * implementation function. Without it, the needle will be matched twice.
 * "hot" will be matched for each "Hotel" word.
 */
public function testMatcherCountsNeedlesNotOccurences2()
{
    $needles  = array('hot');
    $haystack = array('Southern Hotel Hotel');
    $expected = array('Southern Hotel Hotel' => 1);
    $result   = match($needles, $haystack);

    $this->assertEquals($expected, $result);
}

Answer 2

数组和字符串函数通常比regexp更快。使用array_filter和substr_count的组合来完成您想要的操作应该相当容易。

干杯，

Answer 3

@Ionut G. Stan哇，真是个答案！

@Lachlan McDonald 如果您有速度问题（首先尝试，而不仅仅是假设:)）您可以使用针应该匹配字符串的开头：在构建过程中用第一个字母分割haystack并仅迭代与第一个字母匹配的haystack数组针的焦点。

每针不到1/10比较。

Answer 4

你可以尝试：

$results=Array();
foreach ($haystack as $stack) {
 $counter=0;
 $lcstack=strtolower($stack);
 foreach ($needle as $need) {
  if (substr($lcstack,0,strlen($need))==strtolower($need)) {
   $counter++;
  }
 }
 $results[$stack]=$counter;
}

复杂字符串比较

4 个答案:

以下是测试：

这是函数：

测试`break`语句是否必要

复杂字符串比较

4 个答案:

以下是测试：

这是函数：

测试break语句是否必要

测试`break`语句是否必要