有效地在PHP中搜索多个关键字的字符串

时间:2013-09-14 23:19:46

标签: php regex

我目前正在尝试使用1000个关键字每秒搜索多个字符串。直到最近,一切都可以使用我可以发布的一些正则表达式,但可能非常糟糕。我可以使用哪些方法?我已经读了一些关于特里的但不确定这些是否适合我的需求?

// 100 strings per second
// 100 characters long average
foreach ($stringSet as $haystack) {
    // 10000 keywords
    // 10 characters long average and can be multiple words
    $matches = stringContains($needles, $haystack)
    // Do stuff with matches
}

正则表达式(与之前的代码不太匹配,因为它有点伪):

function stringContains($needles, $haystack) {
    $matchingTerms = array();
    $matches = array();
    foreach ($needles as $needle) {
        $needle = preg_split('/([^[:alnum:]])+/u',$needle);
        $needle = implode('',$needle);
        $needle  = preg_split('/(?<!^)(?!$)/u', $needle);
        $pattern = implode('[^[:alnum:]]*', $needle);
        $pattern = '/\b'.$pattern.'\b/iu';

        preg_match_all($pattern, $haystack, $matches);
        foreach ($matches as $match) {
            $matchingTerms = array_merge($matchingTerms, $match);
        }
    }
    return $matchingTerms;
}

1 个答案:

答案 0 :(得分:1)

可能会出现以下情况。

function stringContains($needles, $haystack) {    
   $matchingTerms = array();
   $matches = array();

   foreach ($needles as $needle) {
      $pattern = "/\b(" . implode('|', $needle) . ")\b/i";
      $found   = preg_match_all($pattern, $haystack, $matches);

      if ($found) {
        $keys = array_unique($matches[0]);
        foreach ($keys as $key) {
           $matchingTerms = array_merge($matchingTerms, $key);
        }
      }
}