如何在全文搜索中处理多种搜索条件和优先级

时间:2019-09-17 07:01:11

标签: php mysql full-text-search

是否可以以任何方式减少执行的查询? 因为我现在这样做的方式还可以,但是后来我可以进行30个查询,对我来说这看起来还不行

我的脚本

$string = 'new movie stars';
$words =  preg_split('/(\/|\s+)/', $string);
print_r($words);
  

数组([0] =>新[1] =>电影[2] =>星)

$sql = "SELECT * FROM movie WHERE MATCH(name) AGAINST('+$words[0] +$words[1] +$words[2]' IN BOOLEAN MODE)";
$query_name = $this->db->query($sql);

if ($query_name->num_rows < 20) {
$sql = "SELECT * FROM movie WHERE MATCH(name) AGAINST('+$words[0] +($words[1] $words[2])' IN BOOLEAN MODE)";
$query_name_two = $this->db->query($sql);
}

if (count($query_name->num_rows + $query_name_two->num_rows) < 20) {
$sql = "SELECT * FROM movie WHERE MATCH(name) AGAINST('$words[0] $words[1] $words[2]' IN BOOLEAN MODE)";
$query_name_three = $this->db->query($sql);
}

1 个答案:

答案 0 :(得分:1)

您的代码对SQL injection相关的攻击开放。甚至real_escape_string也无法完全保护它。请学习改用Prepared Statements

现在,除了上述建议外,还有两个可能的修复方法:

修复#1 用于将输入字符串标记成单词的FTS的php代码不足。不久以前,我确实创建了一个函数来以更可靠的方式处理此需求。您可以改为使用以下内容:

/**
 * Method to take an input string and tokenize it into an array of words for Full Text Searching (FTS).
 * This method is used when an input string can be made up of multiple words (let's say, separated by space characters),
 * and we need to use different Boolean operators on each of the words. The tokenizing process is similar to extraction
 * of words by FTS parser in MySQL. The operators used for matching in Boolean condition are removed from the input $phrase.
 * These characters as of latest version of MySQL (8+) are: +-><()~*:""&|
 * We can also execute the following query to get updated list: show variables like 'ft_boolean_syntax';
 * Afterwards, the modified string is split into individual words considering either space, comma, and, period (.) characters.
 * Details at: https://dev.mysql.com/doc/refman/8.0/en/fulltext-natural-language.html
 * @param string $phrase Input statement/phrase consisting of words
 * @return array Tokenized words
 * @author Madhur, 2019
 */
function tokenizeStringIntoFTSWords(string $phrase) : array {
    $phrase_mod = trim(preg_replace('/[><()~*:"&|+-]/', '', trim($phrase)));
    return preg_split('/[\s,.]/', $phrase_mod, null, PREG_SPLIT_NO_EMPTY);
}

修复#2 似乎您试图通过按以下顺序赋予优先级来对搜索排名:

文本>中的所有单词,第一个单词与其余两个单词中的任何一个>都与这三个单词中的任何一个保持一致。

但是,如果您阅读Full Text Search Documentation,则可以使用MATCH()按相关性进行排序,因为它还会返回相关性得分。

  

MATCH()子句中使用WHERE时,返回的行是   会自动按照相关性最高的顺序自动排序(不幸的是,这仅适用于自然模式,不适用于布尔模式)。关联   值是非负浮点数。零相关性表示没有   相似。相关性是根据   行(文档),该行中唯一词的数量,总计   集合中的单词数以及包含的行数   一个特定的单词。

因此,基本上,文本中的所有单词的相关性已经比至少三个单词中的任何一个高。现在,如果您需要赋予第一个单词更高的优先级,只需在第一个单词上使用>运算符。因此,您只需要执行以下单个查询即可:

SELECT * FROM movie 
WHERE 
  MATCH(name) 
  AGAINST('>:first_word :second_word :third_word ..and so on)' IN BOOLEAN MODE)
ORDER BY 
  MATCH(name) 
  AGAINST('>:first_word :second_word :third_word ..and so on)' IN BOOLEAN MODE) 
  DESC
LIMIT 20