MySQL 8窗口函数+全文搜索

时间:2017-10-20 09:09:12

标签: mysql sql mysql-8.0

我在x86_64(MySQL社区服务器(GPL))上使用mysql Ver 8.0.3-rc for Linux

在列名称

上创建表格和全文索引
CREATE TABLE `title` (
  `id` smallint(4) unsigned NOT NULL PRIMARY KEY,
  `name` text COLLATE utf8_unicode_ci,
  FULLTEXT idx (name) WITH PARSER ngram
) DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

插入一些数据:

insert into `title` values(14,"I'm flying in for the game (one night in Niagara Falls, NY and one night in Buffalo then back home).");
insert into `title` values(23,"I've never been to the area.");
insert into `title` values(43,"Where and what must I eat (Canadian side of Niagara, American side and Buffalo)?");
insert into `title` values(125,"Don't really have much planned other than the Falls and the game.");

执行时:

select
    id,
    round(MATCH (name) AGAINST ('other than the'),2) scope
from title;

结果(一切正常):

id  | scope
----------
14  | 0.43
23  | 0.23
43  | 0.12
125 | 1.15

使用经典 GROUP BY 时 - 一切正常

select
    max(scope),
    min(scope),
    sum(scope)
from
(
    select id, round(MATCH (name) AGAINST ('other than the'),2) scope
    from title
) a;

结果确定:

max  |  min | sum
----------------
1.15 | 0.12 | 1.96

但是当我尝试使用窗口功能而不是时,我不了解结果:

select
    id,
    max(scope) over(),
    min(scope) over(),
    sum(scope) over()
from
(
    select id, round(MATCH (name) AGAINST ('other than the'),2) scope
    from title
) a;

我得到一个奇怪的结果(为什么?):

id | max  |  min | sum
------------------------
14 | 1.15 | 1.15 |  4.60
23 | 1.15 | 1.15 |  4.60
43 | 1.15 | 1.15 |  4.60
125| 1.15 | 1.15 |  4.60

我希望得到类似于经典组的结果,例如:

id | max  |  min | sum
------------------------
14 | 1.15 | 0.12 |  1.96
23 | 1.15 | 0.12 |  1.96
43 | 1.15 | 0.12 |  1.96
125| 1.15 | 0.12 |  1.96

这是mysql Ver 8.0.3-rc 中的错误还是我的查询不正确? 谢谢!

2 个答案:

答案 0 :(得分:0)

看起来您在MySQL中发现了一个错误,报告错误:bugs.mysql.com

我在MySQL和MariaDB中执行了以下脚本(没有WITH PARSER ngram,因为目前在MariaDB中它不受支持,请参阅Add "ngram" support to MariaDB),结果如下:

MySQL的:

mysql> SELECT VERSION();
+--------------+
| VERSION()    |
+--------------+
| 8.0.3-rc-log |
+--------------+
1 row in set (0.00 sec)

mysql> DROP TABLE IF EXISTS `title`;
Query OK, 0 rows affected (0.02 sec)

mysql> CREATE TABLE `title` (
    ->   `id` SMALLINT UNSIGNED NOT NULL PRIMARY KEY,
    ->   `name` TEXT COLLATE utf8_unicode_ci,
    ->   FULLTEXT idx (`name`) -- WITH PARSER ngram
    -> ) DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO `title`
    -> VALUES
    ->   (14, "I'm flying in for the game (one night in Niagara Falls, NY and one night in Buffalo then back home)."),
    ->   (23, "I've never been to the area."),
    ->   (43, "Where and what must I eat (Canadian side of Niagara, American side and Buffalo)?"),
    ->   (125, "Don't really have much planned other than the Falls and the game.");
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> SELECT
    ->   MAX(`scope`),
    ->   MIN(`scope`),
    ->   SUM(`scope`)
    -> FROM
    -> (
    ->   SELECT
    ->     `id`,
    ->     ROUND(MATCH (`name`) AGAINST ('other than the'), 2) `scope`
    ->   FROM `title`
    -> ) `a`;
+--------------+--------------+--------------+
| MAX(`scope`) | MIN(`scope`) | SUM(`scope`) |
+--------------+--------------+--------------+
|         0.72 |         0.00 |         0.72 |
+--------------+--------------+--------------+
1 row in set (0.00 sec)

mysql> SELECT
    ->   `id`,
    ->   MAX(`scope`) OVER(),
    ->   MIN(`scope`) OVER(),
    ->   SUM(`scope`) OVER()
    -> FROM
    -> (
    ->   SELECT
    ->     `id`,
    ->     ROUND(MATCH (`name`) AGAINST ('other than the'), 2) `scope`
    ->   FROM `title`
    -> ) `a`;
+-----+---------------------+---------------------+---------------------+
| id  | MAX(`scope`) OVER() | MIN(`scope`) OVER() | SUM(`scope`) OVER() |
+-----+---------------------+---------------------+---------------------+
|  14 |                0.72 |                0.72 |                2.88 |
|  23 |                0.72 |                0.72 |                2.88 |
|  43 |                0.72 |                0.72 |                2.88 |
| 125 |                0.72 |                0.72 |                2.88 |
+-----+---------------------+---------------------+---------------------+
4 rows in set (0.00 sec)

MariaDB的:

MariaDB[_]> SELECT VERSION();
+----------------------------------------+
| VERSION()                              |
+----------------------------------------+
| 10.2.6-MariaDB-10.2.6+maria~jessie-log |
+----------------------------------------+
1 row in set (0.00 sec)

MariaDB[_]> DROP TABLE IF EXISTS `title`;
Query OK, 0 rows affected (0.02 sec)

MariaDB[_]> CREATE TABLE `title` (
         ->   `id` SMALLINT UNSIGNED NOT NULL PRIMARY KEY,
         ->   `name` TEXT COLLATE utf8_unicode_ci,
         ->   FULLTEXT idx (`name`) -- WITH PARSER ngram
         -> ) DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Query OK, 0 rows affected (0.01 sec)

MariaDB[_]> INSERT INTO `title`
         -> VALUES
         ->   (14, "I'm flying in for the game (one night in Niagara Falls, NY and one night in Buffalo then back home)."),
         ->   (23, "I've never been to the area."),
         ->   (43, "Where and what must I eat (Canadian side of Niagara, American side and Buffalo)?"),
         ->   (125, "Don't really have much planned other than the Falls and the game.");
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0

MariaDB[_]> SELECT
         ->   MAX(`scope`),
         ->   MIN(`scope`),
         ->   SUM(`scope`)
         -> FROM
         -> (
         ->   SELECT
         ->     `id`,
         ->     ROUND(MATCH (`name`) AGAINST ('other than the'), 2) `scope`
         ->   FROM `title`
         -> ) `a`;
+--------------+--------------+--------------+
| MAX(`scope`) | MIN(`scope`) | SUM(`scope`) |
+--------------+--------------+--------------+
|         0.72 |         0.00 |         0.72 |
+--------------+--------------+--------------+
1 row in set (0.00 sec)

MariaDB[_]> SELECT
         ->   `id`,
         ->   MAX(`scope`) OVER(),
         ->   MIN(`scope`) OVER(),
         ->   SUM(`scope`) OVER()
         -> FROM
         -> (
         ->   SELECT
         ->     `id`,
         ->     ROUND(MATCH (`name`) AGAINST ('other than the'), 2) `scope`
         ->   FROM `title`
         -> ) `a`;
+-----+--------------+--------------+--------------+
| id  | MAX(`scope`) | MIN(`scope`) | SUM(`scope`) |
+-----+--------------+--------------+--------------+
|  14 |         0.72 |         0.00 |         0.72 |
|  23 |         0.72 |         0.00 |         0.72 |
|  43 |         0.72 |         0.00 |         0.72 |
| 125 |         0.72 |         0.00 |         0.72 |
+-----+--------------+--------------+--------------+
4 rows in set (0.00 sec)

答案 1 :(得分:0)

关于wchiquito的回答:你是对的,有一个错误。它自发布以来已得到修复。修复之后,MySQL将此答案返回到窗口查询:

mysql> SELECT
    ->        `id`,
    ->        MAX(`scope`) OVER() `max`,
    ->        MIN(`scope`) OVER() `min`,
    ->        SUM(`scope`) OVER() `sum`
    ->      FROM
    ->      (
    ->        SELECT
    ->          `id`,
    ->          ROUND(MATCH (`name`) AGAINST ('other than the'), 2) `scope`
    ->        FROM `title`
    ->      ) `a`;
+-----+------+------+------+
| id  | max  | min  | sum  |
+-----+------+------+------+
|  14 | 0.72 | 0.00 | 0.72 |
|  23 | 0.72 | 0.00 | 0.72 |
|  43 | 0.72 | 0.00 | 0.72 |
| 125 | 0.72 | 0.00 | 0.72 |
+-----+------+------+------+
4 rows in set (0,01 sec)

仍然与你引用Maria的那个不同;但我相信 上面的MySQL答案是正确的:由于窗口规范是空的,窗口函数应该作用于每一行的结果集中的所有行,即对每个结果集行的窗口函数调用应该产生相同的值。

如果您对结果集进行分区的方式与对GROUP BY查询的分区类似(请参阅下面的PARTITION BY a.id),您将看到以下结果:

mysql> SELECT
    ->        `id`,
    ->        MAX(`scope`) OVER(PARTITION BY a.id) `max`,
    ->        MIN(`scope`) OVER(PARTITION BY a.id) `min`,
    ->        SUM(`scope`) OVER(PARTITION BY a.id) `sum`
    ->      FROM
    ->      (
    ->        SELECT
    ->          `id`,
    ->          ROUND(MATCH (`name`) AGAINST ('other than the'), 2) `scope`
    ->        FROM `title`
    ->      ) `a`;
+-----+------+------+------+
| id  | max  | min  | sum  |
+-----+------+------+------+
|  14 | 0.00 | 0.00 | 0.00 |
|  23 | 0.00 | 0.00 | 0.00 |
|  43 | 0.00 | 0.00 | 0.00 |
| 125 | 0.72 | 0.72 | 0.72 |
+-----+------+------+------+
4 rows in set (0,00 sec)

因为每一行都是自己的分区。这与您为没有PARTITION BY 的Maria 引用的内容相同。