MySQL零星的MATCH AGAINST具有唯一索引的行为

时间:2017-07-24 13:14:00

标签: mysql full-text-search innodb unique-index sqlfiddle

在多表全文布尔搜索中添加唯一键时,结果会循环显示3个任意状态中的1个,只有1个是正确的。

在检查下面的sqlfiddle时请记住这一点,因为查询可能最初正常工作 - 在这种情况下,在左侧面板中添加空格然后重建并重新运行 - 然后它应该被破坏(但它非常突然)。

http://sqlfiddle.com/#!9/8d95ba/18

以下是有问题的查询:

SELECT `i`.`item_id`, `g_a`.`alias` AS `group`, `i`.`name` AS `name`
  FROM `item` `i`
  JOIN `group_alias` `g_a` USING (group_id)
    WHERE
      MATCH (`g_a`.`alias`) AGAINST ('Mac*' IN BOOLEAN MODE)
    OR
      MATCH (`i`.`name`) AGAINST ('Mac*' IN BOOLEAN MODE);

足够简单。但是添加了以下唯一索引:

ALTER TABLE `item_with_unique` ADD UNIQUE INDEX `unique_item_group` (`group_id`, `name`)

结果在这三种状态之间任意循环:

  1. 返回所有行,就好像没有WHERE子句
  2. 一样
  3. 返回别名匹配,就好像WHERE子句
  4. 没有OR部分一样
  5. 返回正确的结果(根据我的经验,这是最罕见的)
  6. 行为似乎与这三种状态中的任何一种状态保持一致,直到查询以某种小的方式改变(比如添加括号)或者重构模式 - 此时它可能会发生变化。

    这些限制是我在描述这种行为的MySQL文档中遗漏的吗?这是一个错误吗?或者我刚刚做了一些明显错误的事情?

    Mysql版本5.6.35(撰写本文时为sqlfiddle)。

    如果链接死亡,后代的Sqlfiddle:

    CREATE TABLE `group` (
      `group_id` INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
      `name` VARCHAR(256),
      FULLTEXT INDEX `search` (`name`)
    ) ENGINE = InnoDB;
    
    CREATE TABLE `group_alias` (
      `group_id` INT UNSIGNED NOT NULL,
      `alias` VARCHAR(256),
      CONSTRAINT `alias_group_id`
        FOREIGN KEY (`group_id`)
        REFERENCES `group` (`group_id`),
      FULLTEXT INDEX `search` (`alias`)
    ) ENGINE = InnoDB;
    
    CREATE TABLE `item` (
      `item_id` INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
      `group_id` INT UNSIGNED,
      `name` VARCHAR(255) NOT NULL,
      CONSTRAINT `item_group_id`
        FOREIGN KEY (`group_id`)
        REFERENCES `group` (`group_id`),
      FULLTEXT INDEX `search` (`name`)
    ) ENGINE = InnoDB;
    
    CREATE TABLE `item_with_unique` LIKE `item`;
    ALTER TABLE `item_with_unique` ADD UNIQUE INDEX `unique_item_group` (`group_id`, `name`);
    
    INSERT INTO `group` (`group_id`, `name`) VALUES (1, 'Thompson');
    INSERT INTO `group` (`group_id`, `name`) VALUES (2, 'MacDonald');
    INSERT INTO `group` (`group_id`, `name`) VALUES (3, 'Stewart');
    
    INSERT INTO `group_alias` (`group_id`, `alias`) VALUES (1, 'Tomson');
    INSERT INTO `group_alias` (`group_id`, `alias`) VALUES (2, 'Something');
    INSERT INTO `group_alias` (`group_id`, `alias`) VALUES (3, 'MacStewart');
    
    INSERT INTO `item` (`item_id`, `group_id`, `name`) VALUES (1, 1, 'MacTavish');
    INSERT INTO `item` (`item_id`, `group_id`, `name`) VALUES (2, 1, 'MacTavish; Red');
    INSERT INTO `item` (`item_id`, `group_id`, `name`) VALUES (3, 2, 'MacAgnew');
    INSERT INTO `item` (`item_id`, `group_id`, `name`) VALUES (4, 3, 'Spider');
    INSERT INTO `item` (`item_id`, `group_id`, `name`) VALUES (5, 2, 'blahblah');
    
    INSERT INTO `item_with_unique` SELECT * FROM `item`;
    
    
    SELECT `i`.`item_id`, `g_a`.`alias` AS `group`, `i`.`name` AS `name`,
    IF(MATCH (`g_a`.`alias`) AGAINST ('Mac*' IN BOOLEAN MODE), 1, 0) AS `group_match`,
    IF(MATCH (`i`.`name`) AGAINST ('Mac*' IN BOOLEAN MODE), 1, 0) AS `item_match`
      FROM `item` `i`
      JOIN `group_alias` `g_a` USING (group_id)
        WHERE
          MATCH (`g_a`.`alias`) AGAINST ('Mac*' IN BOOLEAN MODE)
        OR
          MATCH (`i`.`name`) AGAINST ('Mac*' IN BOOLEAN MODE);
    
    SELECT "Same query, using table with unique index (NOTE: sporadically this is actually correct, in such case, skip to bottom notes)";
    SELECT `i`.`item_id`, `g_a`.`alias` AS `group`, `i`.`name` AS `name`,
    IF(MATCH (`g_a`.`alias`) AGAINST ('Mac*' IN BOOLEAN MODE), 1, 0) AS `group_match`,
    IF(MATCH (`i`.`name`) AGAINST ('Mac*' IN BOOLEAN MODE), 1, 0) AS `item_match`
      FROM `item_with_unique` `i`
      JOIN `group_alias` `g_a` USING (group_id)
        WHERE
          MATCH (`g_a`.`alias`) AGAINST ('Mac*' IN BOOLEAN MODE)
        OR
          MATCH (`i`.`name`) AGAINST ('Mac*' IN BOOLEAN MODE);
    
    SELECT "Union of the two OR match conditions seperately (expected result from second query)";
    SELECT `i`.`item_id`, `g_a`.`alias` AS `group`, `i`.`name` AS `name`,
    IF(MATCH (`g_a`.`alias`) AGAINST ('Mac*' IN BOOLEAN MODE), 1, 0) AS `group_match`,
    IF(MATCH (`i`.`name`) AGAINST ('Mac*' IN BOOLEAN MODE), 1, 0) AS `item_match`
      FROM `item_with_unique` `i`
      JOIN `group_alias` `g_a` USING (group_id)
        WHERE
          MATCH (`g_a`.`alias`) AGAINST ('Mac*' IN BOOLEAN MODE)
    UNION
    SELECT `i`.`item_id`, `g_a`.`alias` AS `group`, `i`.`name` AS `name`,
    IF(MATCH (`g_a`.`alias`) AGAINST ('Mac*' IN BOOLEAN MODE), 1, 0) AS `group_match`,
    IF(MATCH (`i`.`name`) AGAINST ('Mac*' IN BOOLEAN MODE), 1, 0) AS `item_match`
      FROM `item_with_unique` `i`
      JOIN `group_alias` `g_a` USING (group_id)
        WHERE
          MATCH (`i`.`name`) AGAINST ('Mac*' IN BOOLEAN MODE);
    
    SELECT "Now rebuild the schema (add a newline somewhere so sqlfiddle thinks it has changed) and observe that the results of the second query.  It may take multiple attempts but it usually cycles between 3 states:";
    SELECT "1: Returns ALL results as if there were no conditions (5 rows)";
    SELECT "2: Returns results as if there were no second part to the OR condition (1 row)";
    SELECT "3: Returns the correct results (rarely)";
    

2 个答案:

答案 0 :(得分:0)

如果您有单字名称和别名。而且您正在检查整个值或主要值。那么FULLTEXT不是你需要的索引类型。

简单的INDEX(name)name LIKE 'Mac%'将非常有效。

如果您有一个包含大量单词的长短语,并且“MacDonald”可能位于其中间,则然后 FULLTEXTMATCH ... AGAINST是正确的方式去。

使用任何一种索引,

WHERE table1 ...
   OR table2 ...

效率低下。粗略地说,优化器必须进行“交叉连接”以获得两个表之间的所有行组合,然后查看哪些匹配一个或其他匹配/喜欢。

也许您对数据“过度规范化”了? namealias不能同时位于同一个表格中吗?查询将更快,将有优化技术使其更快。只有1K行,你所拥有的东西会明显变慢;我提出的建议可以超过数百万甚至数十亿行进行优化。

答案 1 :(得分:0)

尝试在您的声明中使用IGNORE INDEX

SELECT `i`.`item_id`, `g_a`.`alias` AS `group`, `i`.`name` AS `name`
  FROM `item` `i`
  IGNORE INDEX (unique_item_group)
  JOIN `group_alias` `g_a` USING (group_id)
    WHERE
      MATCH (`g_a`.`alias`) AGAINST ('Mac*' IN BOOLEAN MODE)
    OR
      MATCH (`i`.`name`) AGAINST ('Mac*' IN BOOLEAN MODE);

MySQL非常笨拙,无法随机使用unique_item_group进行全文搜索。