MYsql FULLTEXT查询产生意外排名;为什么?

时间:2012-09-10 21:59:24

标签: mysql search full-text-search match

我正在尝试使用标签进行全文搜索,但它对我来说无法正常运行附加图片请enter image description here

查询是:

 SELECT *, 
         MATCH(tags) AGAINST ('tag3 tag6 tag4') AS score 
    FROM items
ORDER BY score DESC

为什么分数不在正确的顺序字段中排序?如果你检查第二行有我搜索的所有标签,而第一个字段没有tag3关键字。

我的意思是id字段顺序应该是:5,1,2 ..等而不是1,5,2..etc

我的错误在哪里?

然后我想首先在标签字段中搜索然后如果没有结果我想在描述字段中搜索与FULLTEXT相同的关键字,那么如果标签不匹配,用户将在标签和描述中搜索,是否可以在相同的查询或我需要两个分开的查询?

3 个答案:

答案 0 :(得分:2)

在本文档http://dev.mysql.com/doc/refman/5.0/en/fulltext-natural-language.html中,它说“对于非常小的表格,单词分布不能充分反映其语义价值,这种模型有时会产生奇怪的结果。”

如果你的项目表很小 - 例如一个样本表 - 你可能会遇到这个问题并得到一个“奇怪”的结果。

您可以尝试使用此查询IN BOOLEAN MODE来查看您的结果是否与预测相​​符。试试这个。

    SELECT *, 
           MATCH(tags) AGAINST ('tag3 tag6 tag4' IN BOOLEAN MODE) AS score 
      FROM items
  ORDER BY score DESC

布尔模式禁用字分布排名。请注意,您应该了解自然语言和布尔模式之间的区别,并且一旦您拥有一个大小合适的表,就可以明智地选择使用哪一个。如果您正在搜索博客中包含的标签类型,那么布尔可能是最佳选择。

答案 1 :(得分:1)

首先,这是您在我的Windows7机器上加载到MySQL 5.5.12的示例数据

mysql> DROP DATABASE IF EXISTS lspuk;
Query OK, 1 row affected (0.00 sec)

mysql> CREATE DATABASE lspuk;
Query OK, 1 row affected (0.00 sec)

mysql> USE lspuk
Database changed
mysql> CREATE TABLE items
    -> (
    ->     id int not null auto_increment,
    ->     description VARCHAR(30),
    ->     tags VARCHAR(30),
    ->     primary key (id),
    ->     FULLTEXT tags_ftndx (tags)
    -> ) ENGINE=MyISAM;
Query OK, 0 rows affected (0.04 sec)

mysql> INSERT INTO items (description,tags) VALUES
    -> ('the first' ,'tag1 tag3 tag4'),
    -> ('the second','tag5 tag1 tag2'),
    -> ('the third' ,'tag5 tag1 tag9'),
    -> ('the fourth','tag5 tag6 tag2'),
    -> ('the fifth' ,'tag4 tag3 tag6'),
    -> ('the sixth' ,'tag2 tag3 tag6');
Query OK, 6 rows affected (0.00 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql>

请查看MySQL中标记填充的方式:

mysql> SELECT 'tag1',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag1%' UNION
    -> SELECT 'tag2',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag2%' UNION
    -> SELECT 'tag3',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag3%' UNION
    -> SELECT 'tag4',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag4%' UNION
    -> SELECT 'tag5',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag5%' UNION
    -> SELECT 'tag6',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag6%' UNION
    -> SELECT 'tag9',COUNT(1) tag_count FROM items WHERE tags LIKE '%tag9%';
+------+-----------+
| tag1 | tag_count |
+------+-----------+
| tag1 |         3 |
| tag2 |         3 |
| tag3 |         3 |
| tag4 |         2 |
| tag5 |         3 |
| tag6 |         3 |
| tag9 |         1 |
+------+-----------+
7 rows in set (0.00 sec)

mysql>

请仔细阅读并注意以下事实:

  1. 每行有3个标签
  2. 请求标记的顺序与每个标记存在的数量似乎控制得分
  3. 如果删除tag4并运行查询,则无法获得任何分数

    mysql> SELECT *,MATCH(tags) AGAINST ('tag3 tag6') as score FROM items ORDER BY score DESC;
    +----+-------------+----------------+-------+
    | id | description | tags           | score |
    +----+-------------+----------------+-------+
    |  1 | the first   | tag1 tag3 tag4 |     0 |
    |  2 | the second  | tag5 tag1 tag2 |     0 |
    |  3 | the third   | tag5 tag1 tag9 |     0 |
    |  4 | the fourth  | tag5 tag6 tag2 |     0 |
    |  5 | the fifth   | tag4 tag3 tag6 |     0 |
    |  6 | the sixth   | tag2 tag3 tag6 |     0 |
    +----+-------------+----------------+-------+
    6 rows in set (0.00 sec)
    

    评估方法似乎基于令牌字段的平均数量,并且特定顺序中存在和/或不存在特定值会影响评分。如果您应用不同风格的评分和标签规范,请记下各种分数:

    mysql> SELECT *,MATCH(tags) AGAINST ('tag3 tag6 tag4') as score FROM items ORDER BY score DESC;
    +----+-------------+----------------+--------------------+
    | id | description | tags           | score              |
    +----+-------------+----------------+--------------------+
    |  1 | the first   | tag1 tag3 tag4 | 0.6700310707092285 |
    |  5 | the fifth   | tag4 tag3 tag6 | 0.6700310707092285 |
    |  2 | the second  | tag5 tag1 tag2 |                  0 |
    |  3 | the third   | tag5 tag1 tag9 |                  0 |
    |  4 | the fourth  | tag5 tag6 tag2 |                  0 |
    |  6 | the sixth   | tag2 tag3 tag6 |                  0 |
    +----+-------------+----------------+--------------------+
    6 rows in set (0.00 sec)
    
    mysql> SELECT *,MATCH(tags) AGAINST ('tag3 tag6 tag4' IN BOOLEAN MODE) as score FROM items ORDER BY score DESC;
    +----+-------------+----------------+-------+
    | id | description | tags           | score |
    +----+-------------+----------------+-------+
    |  5 | the fifth   | tag4 tag3 tag6 |     3 |
    |  1 | the first   | tag1 tag3 tag4 |     2 |
    |  6 | the sixth   | tag2 tag3 tag6 |     2 |
    |  4 | the fourth  | tag5 tag6 tag2 |     1 |
    |  2 | the second  | tag5 tag1 tag2 |     0 |
    |  3 | the third   | tag5 tag1 tag9 |     0 |
    +----+-------------+----------------+-------+
    6 rows in set (0.00 sec)
    
    mysql> SELECT *,MATCH(tags) AGAINST ('+tag3 +tag6 +tag4' IN BOOLEAN MODE) as score FROM items ORDER BY score DESC;
    +----+-------------+----------------+-------+
    | id | description | tags           | score |
    +----+-------------+----------------+-------+
    |  5 | the fifth   | tag4 tag3 tag6 |     1 |
    |  1 | the first   | tag1 tag3 tag4 |     0 |
    |  2 | the second  | tag5 tag1 tag2 |     0 |
    |  3 | the third   | tag5 tag1 tag9 |     0 |
    |  4 | the fourth  | tag5 tag6 tag2 |     0 |
    |  6 | the sixth   | tag2 tag3 tag6 |     0 |
    +----+-------------+----------------+-------+
    6 rows in set (0.00 sec)
    
    mysql>
    

    解决方案似乎是评估BOOLEAN MODE得分,然后是非BOOLEAN MODE得分如下:

    SELECT *,
    MATCH(tags) AGAINST ('tag3 tag6 tag4') as score1,
    MATCH(tags) AGAINST ('+tag3 +tag6 +tag4' IN BOOLEAN MODE) as score2
    FROM items ORDER BY score2 DESC, score1 DESC;
    

    以下是针对您的示例数据的结果:

    mysql> SELECT *,
        -> MATCH(tags) AGAINST ('tag3 tag6 tag4') as score1,
        -> MATCH(tags) AGAINST ('+tag3 +tag6 +tag4' IN BOOLEAN MODE) as score2
        -> FROM items ORDER BY score2 DESC, score1 DESC;
    +----+-------------+----------------+--------------------+--------+
    | id | description | tags           | score1             | score2 |
    +----+-------------+----------------+--------------------+--------+
    |  5 | the fifth   | tag4 tag3 tag6 | 0.6700310707092285 |      1 |
    |  1 | the first   | tag1 tag3 tag4 | 0.6700310707092285 |      0 |
    |  2 | the second  | tag5 tag1 tag2 |                  0 |      0 |
    |  3 | the third   | tag5 tag1 tag9 |                  0 |      0 |
    |  4 | the fourth  | tag5 tag6 tag2 |                  0 |      0 |
    |  6 | the sixth   | tag2 tag3 tag6 |                  0 |      0 |
    +----+-------------+----------------+--------------------+--------+
    6 rows in set (0.00 sec)
    
    mysql>
    

    或者您可以尝试不使用加号

    mysql> SELECT *,
        -> MATCH(tags) AGAINST ('tag3 tag6 tag4') as score1,
        -> MATCH(tags) AGAINST ('tag3 tag6 tag4' IN BOOLEAN MODE) as score2
        -> FROM items ORDER BY score2 DESC, score1 DESC;
    +----+-------------+----------------+--------------------+--------+
    | id | description | tags           | score1             | score2 |
    +----+-------------+----------------+--------------------+--------+
    |  5 | the fifth   | tag4 tag3 tag6 | 0.6700310707092285 |      3 |
    |  1 | the first   | tag1 tag3 tag4 | 0.6700310707092285 |      2 |
    |  6 | the sixth   | tag2 tag3 tag6 |                  0 |      2 |
    |  4 | the fourth  | tag5 tag6 tag2 |                  0 |      1 |
    |  2 | the second  | tag5 tag1 tag2 |                  0 |      0 |
    |  3 | the third   | tag5 tag1 tag9 |                  0 |      0 |
    +----+-------------+----------------+--------------------+--------+
    6 rows in set (0.00 sec)
    
    mysql>
    

    无论哪种方式,您都必须同时采用BOOLEAN MODE和非BOOLEAN模式。

答案 2 :(得分:0)

订单修改为按分数DESC编号,ID DESC
假设得分值相同,则 5 的行将首先显示。