我有一个包含700,000个条目的表格,我需要检查每个条目的1,000,000个单词,然后将hello
中找到的单词替换为#~hello~#
。单词可以在条目中多次出现,需要全部替换。我在PHP中尝试了这个,估计完成代码的时间大约为362天。我刚刚修改了代码以在MySQL中使用LIKE,这样我就没有检查所有700,000个单词中的1,000,000个单词,但估计完成时间仍然是29天。这看起来真的很高。
使问题单词更复杂化可以是多个单词。例如,如果单词为hello world
,则该程序应替换为#~hello world~#
。
我错过了什么?
代码看起来像这样:
$query = "SELECT word_id, word_name, FROM words ORDER BY char_length(word_name) DESC";
$result = mysqli_query($con, $query);
while($row = mysqli_fetch_array($result)){
$words[$i] = new wordObj($row['word_id'], $row['word_name']);
}
Foreach($words as $word){
$query = "SELECT id, entry FROM entries WHERE entry LIKE '%".$word."%'";
$result = mysqli_query($con, $query);
if ($result) {
if ($result->num_rows != 0) {
while($row = mysqli_fetch_array($result)){
$entry[$i] = new meatObj($row['id'], $row['entry']);
$i++;
}
}else{
$entry = '';
}
}else{
$entry ='';
}
foreach($entryArray as $entry){
check entry for all words and replace
}
}
答案 0 :(得分:2)
最简单的解决方案是在哈希表中存储需要替换的所有单词。然后在每个条目上,我们打破所有单词并检查哈希表。
// HOW DOES TAKE 29 DAYS TO EXECUTE?
// Create a hash table to store all the words
$hash = array();
$query = "SELECT word_id, word_name, FROM words ORDER BY char_length(word_name) DESC";
$result = mysqli_query($con, $query);
while($row = mysqli_fetch_array($result)){
$hash[strtolower($row['word_name'])] = true;
}
// DO SOME QUERY HERE
// .....
while($row = mysqli_fetch_array($result)) {
$delimiter = "/([ \.,\"'!\?\-_;])/";
$tokens = preg_split($delimiter, $row['entry'], -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY));
// replace the text
$final = "";
foreach($tokens as $token) {
if (isset($hash[strtolower($token)])) {
$final .= "#~" $token . "~#";
} else {
$final .= $token;
}
}
// UPDATE NEW ENTRY HERE
// .......
}