在具有5M +行

时间:2016-07-21 05:16:59

标签: php mysql full-text-search union

我有一个大约500万行数据(文章)的表。我在两种不同语言的文章标题中有以下查询全文搜索。问题在于它需要大约15秒才能执行。 MySQL version: 5.6.29-log

以下是查询:

SELECT `id`, `title`, `title_fa` FROM
    (SELECT `p`.`id` AS `id`, `p`.`title` AS `title`, `p`.`title_fa` AS `title_fa`, `p`.`unique` AS `unique`, `p`.`date` AS `date` FROM `articles` `p` LEFT JOIN `authors` `a` ON  `p`.`unique` =  `a`.`unique` WHERE 1 AND MATCH (`p`.`title`) AGAINST ('"heat"' IN BOOLEAN MODE)
    UNION
    SELECT `p`.`id` AS `id`, `p`.`title` AS `title`, `p`.`title_fa` AS `title_fa`, `p`.`unique` AS `unique`, `p`.`date` AS `date` FROM `articles` `p` LEFT JOIN `authors` `a` ON  `p`.`unique` =  `a`.`unique` WHERE 1 AND MATCH (`p`.`title_fa`) AGAINST ('"گرما"' IN BOOLEAN MODE)) AS `subQuery`
GROUP BY `unique` ORDER BY `date` DESC LIMIT 0,10;

这是表结构:

CREATE TABLE `articles` (
  `id` int(10) unsigned NOT NULL,
  `title` text COLLATE utf8_persian_ci NOT NULL,
  `title_fa` text COLLATE utf8_persian_ci NOT NULL,
  `description` text COLLATE utf8_persian_ci NOT NULL,
  `description_fa` text COLLATE utf8_persian_ci NOT NULL,
  `date` date NOT NULL,
  `unique` tinytext COLLATE utf8_persian_ci NOT NULL,
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_persian_ci;

ALTER TABLE `articles`
  ADD PRIMARY KEY (`id`),
  ADD KEY `unique` (`unique`(128)),
  ADD FULLTEXT KEY `TtlDesc` (`title`,`description`);
  ADD FULLTEXT KEY `Title` (`title`);
  ADD FULLTEXT KEY `faTtlDesc` (`title_fa`,`description_fa`);
  ADD FULLTEXT KEY `faTitle` (`title_fa`);
  MODIFY `id` int(10) unsigned NOT NULL AUTO_INCREMENT;

第一个改进步骤:

通过搜索是的,我发现了这篇文章:

Combining UNION and LIMIT operations in MySQL query

使用建议的方法我改变了我的查询如下:

SELECT `id`, `title`, `title_fa` FROM
    (SELECT `p`.`id` AS `id`, `p`.`title` AS `title`, `p`.`title_fa` AS `title_fa`, `p`.`date` AS `date`, `p`.`unique` AS `unique` FROM `articles` `p` LEFT JOIN `authors` `a` ON  `p`.`unique` =  `a`.`unique` WHERE MATCH (`p`.`title`) AGAINST ('"heat"' IN BOOLEAN MODE) LIMIT 0,100
    UNION
    SELECT `p`.`id` AS `id`, `p`.`title` AS `title`, `p`.`title_fa` AS `title_fa`, `p`.`date` AS `date`, `p`.`unique` AS `unique` FROM `articles` `p` LEFT JOIN `authors` `a` ON  `p`.`unique` =  `a`.`unique` WHERE MATCH (`p`.`title_fa`) AGAINST ('"گرما"' IN BOOLEAN MODE) LIMIT 0,100) AS `subQuery`
GROUP BY `unique` ORDER BY `date` DESC LIMIT 0,10

性能惊人,查询执行时间为0.04秒。问题是排序我喜欢先列出最近的文章,但这个查询无法这样做。此外,我不确定如何检索和显示下一组结果(即接下来的10个结果 - 结果的第二页)。

第二个改进步骤:

在SO上搜索更多内容我遇到了这个:

SQL Query - Using Order By in UNION

我的查询如下所示:

SELECT `id`, `title`, `title_fa`, `unique`, `date` FROM
    (SELECT `p`.`id` AS `id`, `p`.`title` AS `title`, `p`.`title_fa` AS `title_fa`, `p`.`date` AS `date`, `p`.`unique` AS `unique` FROM `articles` `p` LEFT JOIN `authors` `a` ON  `p`.`unique` =  `a`.`unique` WHERE MATCH (`p`.`title`) AGAINST ('"heat"' IN BOOLEAN MODE)  ORDER BY `p`.`date` DESC LIMIT 0,20) AS `subQueryE`
    UNION ALL
SELECT `id`, `title`, `title_fa`, `unique`, `date` FROM
    (SELECT `f`.`id` AS `id`, `f`.`title` AS `title`, `f`.`title_fa` AS `title_fa`, `f`.`date` AS `date`, `f`.`unique` AS `unique` FROM `articles` `f` LEFT JOIN `authors` `a` ON  `f`.`unique` =  `a`.`unique` WHERE MATCH (`f`.`title_fa`) AGAINST ('"گرما"' IN BOOLEAN MODE)  ORDER BY `f`.`date` DESC LIMIT 0,20) AS `subQueryF`
GROUP BY `unique` ORDER BY `date` DESC LIMIT 0,10

表现更好但不满意,因为花了大约7秒钟。它带来了另一个问题,即结果中仍然存在GROUP BY unique个重复行。

第三步:

我通过执行以下查询进行了另一项测试,希望获得更好的结果:

SELECT `p`.`id` AS `id`, `p`.`title` AS `title`, `p`.`title_fa` AS `title_fa`, `p`.`date` AS `date`, `p`.`unique` AS `unique` FROM `articles` `p` LEFT JOIN `authors` `a` ON `p`.`unique` = `a`.`unique` WHERE MATCH (`p`.`title`) AGAINST ('"heat"' IN BOOLEAN MODE) OR MATCH (`p`.`title_fa`) AGAINST ('"گرما"' IN BOOLEAN MODE) GROUP BY `unique` ORDER BY `date` DESC LIMIT 0,10

但是执行时间非常糟糕,达到了100秒以上。

任何帮助都非常受欢迎,并提前致谢。

0 个答案:

没有答案