在MySQL数据库中进行REGEXP查找和替换的最快方法是什么?

时间:2015-10-13 03:25:23

标签: php mysql regex performance query-optimization

我有一个包含700,000个条目的表格,我需要检查每个条目的1,000,000个单词,然后将hello中找到的单词替换为#~hello~#。单词可以在条目中多次出现,需要全部替换。我在PHP中尝试了这个,估计完成代码的时间大约为362天。我刚刚修改了代码以在MySQL中使用LIKE,这样我就没有检查所有700,000个单词中的1,000,000个单词,但估计完成时间仍然是29天。这看起来真的很高。

使问题单词更复杂化可以是多个单词。例如,如果单词为hello world,则该程序应替换为#~hello world~#

我错过了什么?

代码看起来像这样:

$query = "SELECT word_id, word_name, FROM words ORDER BY char_length(word_name) DESC";
$result = mysqli_query($con, $query);
while($row = mysqli_fetch_array($result)){
  $words[$i] = new wordObj($row['word_id'], $row['word_name']);
}

Foreach($words as $word){
  $query = "SELECT id, entry FROM entries WHERE entry LIKE '%".$word."%'";
  $result = mysqli_query($con, $query);
  if ($result) {
    if ($result->num_rows != 0) {
      while($row = mysqli_fetch_array($result)){
        $entry[$i] = new meatObj($row['id'], $row['entry']);
        $i++;
      }
    }else{
      $entry = '';
    }
  }else{
    $entry ='';
  }
  foreach($entryArray as $entry){
    check entry for all words and replace
  }
}

1 个答案:

答案 0 :(得分:2)

最简单的解决方案是在哈希表中存储需要替换的所有单词。然后在每个条目上,我们打破所有单词并检查哈希表。

// HOW DOES TAKE 29 DAYS TO EXECUTE?
// Create a hash table to store all the words
$hash = array();

$query = "SELECT word_id, word_name, FROM words ORDER BY char_length(word_name) DESC";
$result = mysqli_query($con, $query);
while($row = mysqli_fetch_array($result)){
    $hash[strtolower($row['word_name'])] = true;
}



// DO SOME QUERY HERE
// .....

while($row = mysqli_fetch_array($result)) {
    $delimiter = "/([ \.,\"'!\?\-_;])/";
    $tokens = preg_split($delimiter, $row['entry'], -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY));

    // replace the text
    $final = "";
    foreach($tokens as $token) {
        if (isset($hash[strtolower($token)])) {
            $final .= "#~" $token . "~#";
        } else {
            $final .= $token;
        }
    }

    // UPDATE NEW ENTRY HERE
    // .......
}