MYSQL FULLTEXT搜索中的撇号/中断词问题

时间:2019-03-14 16:06:09

标签: mysql full-text-search

我正在使用Mysql 5.6.33,设置了全文搜索。

我知道此版本中的单引号存在问题,在单引号中似乎会给搜索带来麻烦。 https://bugs.mysql.com/bug.php?id=69932

我试图通过将原始字符串和删除了撇号的版本保存到具有FULL TEXT索引的列中来解决此问题。但是,对其进行搜索并没有得到分数:

SELECT record_id, keywords, 
  MATCH keywords AGAINST ("+ill +be +there" IN BOOLEAN MODE) AS score  
FROM my_table_name 
WHERE record_id = 12336
ORDER BY score DESC;
+-----------+----------------------------+-------+
| record_id | keywords                   | score |
+-----------+----------------------------+-------+
|     12336 | i'll be there ill be there |     0 |
+-----------+----------------------------+-------+
1 row in set (0.03 sec)

根据该错误,使用+I'll进行布尔搜索将失败。但是在这里,我正在ill上进行BOOLEAN搜索,没有出现撇号,该搜索已在现场。但仍然得到0分。如果我更改为“自然语言”模式,则效果很好。

我的猜测是,无论是什么错误,它都会破坏单词的非撇号版本的索引,这很烦人。

有人遇到过这个问题,找到了解决方法吗?

编辑:实际上,这甚至更奇怪/更容易出问题,并且问题似乎可能是停用词的存在使它返回的分数为零:

mysql> SELECT record_id, keywords, MATCH keywords AGAINST ("+be +there" IN BOOLEAN MODE) AS score  FROM squirrel_digilearning_modules  WHERE record_id = 12336  ORDER BY score DESC;
+-----------+--------------+-------+
| record_id | keywords     | score |
+-----------+--------------+-------+
|     12336 | ill be there |     0 |
+-----------+--------------+-------+
1 row in set (0.00 sec)

mysql> SELECT record_id, keywords, MATCH keywords AGAINST ("+there" IN BOOLEAN MODE) AS score  FROM squirrel_digilearning_modules  WHERE record_id = 12336  ORDER BY score DESC;
+-----------+--------------+-------------------+
| record_id | keywords     | score             |
+-----------+--------------+-------------------+
|     12336 | ill be there | 7.561973571777344 |
+-----------+----------------------------------+

为什么“ + there”匹配不为“ + be + there”匹配?然后,这个:

mysql> SELECT record_id, keywords, MATCH keywords AGAINST ("+ill +there" IN BOOLEAN MODE) AS score  FROM squirrel_digilearning_modules  WHERE record_id = 12336  ORDER BY score DESC;
+-----------+--------------+-------------------+
| record_id | keywords     | score             |
+-----------+--------------+-------------------+
|     12336 | ill be there | 17.85671615600586 |
+-----------+--------------+-------------------+
1 row in set (0.00 sec)

以上,我匹配“病”和“有”,并获得了良好的成绩。因此,问题实际上是“被”!我认为这可能是因为“ be”是一个停用词:

mysql> SELECT GROUP_CONCAT(value) FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD;
+-----------------------------------------------------------------------------------------------------------------------------------------+
| GROUP_CONCAT(value)                                                                                                                     |
+-----------------------------------------------------------------------------------------------------------------------------------------+
| a,about,an,are,as,at,be,by,com,de,en,for,from,how,i,in,is,it,la,of,on,or,that,the,this,to,was,what,when,where,who,will,with,und,the,www |
+-----------------------------------------------------------------------------------------------------------------------------------------+

0 个答案:

没有答案