我有一张像
这样的大表CREATE TABLE IF NOT EXISTS `object_search` (
`keyword` varchar(40) COLLATE latin1_german1_ci NOT NULL,
`object_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`keyword`,`media_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci;
包含大约3900万行(使用超过1 GB的空间),其中包含对象表中100万条记录的索引数据(其中object_id
指向)。
现在使用类似
的查询进行搜索SELECT object_id, COUNT(object_id) AS hits
FROM object_search
WHERE keyword = 'woman' OR keyword = 'house'
GROUP BY object_id
HAVING hits = 2
已经比LIKE
表中的撰写keywords
字段进行object
搜索快得多,但仍然需要1分钟。
它的解释如下:
+----+-------------+--------+------+---------------+---------+---------+-------+--------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------+---------------+---------+---------+-------+--------+----------+--------------------------+
| 1 | SIMPLE | search | ref | PRIMARY | PRIMARY | 42 | const | 345180 | 100.00 | Using where; Using index |
+----+-------------+--------+------+---------------+---------+---------+-------+--------+----------+--------------------------+
加入object
和object_color
以及object_locale
表的完整解释,而上述查询在子查询中运行以避免开销,如下所示:
+----+-------------+-------------------+--------+---------------+-----------+---------+------------------+--------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------------+--------+---------------+-----------+---------+------------------+--------+----------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 182544 | 100.00 | Using temporary; Using filesort |
| 1 | PRIMARY | object_color | eq_ref | object_id | object_id | 4 | search.object_id | 1 | 100.00 | |
| 1 | PRIMARY | locale | eq_ref | object_id | object_id | 4 | search.object_id | 1 | 100.00 | |
| 1 | PRIMARY | object | eq_ref | PRIMARY | PRIMARY | 4 | search.object_id | 1 | 100.00 | |
| 2 | DERIVED | search | ref | PRIMARY | PRIMARY | 42 | | 345180 | 100.00 | Using where; Using index |
+----+-------------+-------------------+--------+---------------+-----------+---------+------------------+--------+----------+---------------------------------+
我的首要目标是能够在1或2秒内完成扫描。
那么,还有其他技术可以提高关键字的搜索速度吗?
<小时/> 更新2013-08-06:
应用 Neville K 的大部分建议我现在有以下设置:
CREATE TABLE `object_search_keyword` (
`keyword_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`keyword` varchar(64) COLLATE latin1_german1_ci NOT NULL,
PRIMARY KEY (`keyword_id`),
FULLTEXT KEY `keyword_ft` (`keyword`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci;
CREATE TABLE `object_search` (
`keyword_id` int(10) unsigned NOT NULL,
`object_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`keyword_id`,`media_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
新查询的解释如下:
+----+-------------+----------------+----------+--------------------+------------+---------+---------------------------+---------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------------+----------+--------------------+------------+---------+---------------------------+---------+----------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 24381 | 100.00 | Using temporary; Using filesort |
| 1 | PRIMARY | object_color | eq_ref | object_id | object_id | 4 | object_search.object_id | 1 | 100.00 | |
| 1 | PRIMARY | object | eq_ref | PRIMARY | PRIMARY | 4 | object_search.object_id | 1 | 100.00 | |
| 1 | PRIMARY | locale | eq_ref | object_id | object_id | 4 | object_search.object_id | 1 | 100.00 | |
| 2 | DERIVED | <derived4> | system | NULL | NULL | NULL | NULL | 1 | 100.00 | |
| 2 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | NULL | 24381 | 100.00 | |
| 4 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
| 3 | DERIVED | object_keyword | fulltext | PRIMARY,keyword_ft | keyword_ft | 0 | | 1 | 100.00 | Using where; Using temporary; Using filesort |
| 3 | DERIVED | object_search | ref | PRIMARY | PRIMARY | 4 | object_keyword.keyword_id | 2190225 | 100.00 | Using index |
+----+-------------+----------------+----------+--------------------+------------+---------+---------------------------+---------+----------+----------------------------------------------+
许多派生来自关键字比较子查询被嵌套到另一个子查询中,该子查询除了计算返回的行数之外什么都不做:
SELECT SQL_NO_CACHE object.object_id, ..., @rn AS numrows
FROM (
SELECT *, @rn := @rn + 1
FROM (
SELECT SQL_NO_CACHE search.object_id, COUNT(turbo.object_id) AS hits
FROM object_keyword AS kwd
INNER JOIN object_search AS search ON (kwd.keyword_id = search.keyword_id)
WHERE MATCH (kwd.keyword) AGAINST ('+(woman) +(house)')
GROUP BY search.object_id HAVING hits = 2
) AS numrowswrapper
CROSS JOIN (SELECT @rn := 0) CONST
) AS turbo
INNER JOIN object AS object ON (search.object_id = object.object_id)
LEFT JOIN object_color AS object_color ON (search.object_id = object_color.object_id)
LEFT JOIN object_locale AS locale ON (search.object_id = locale.object_id)
ORDER BY timestamp_upload DESC
上述查询实际上会在~6秒内运行,因为它会搜索两个关键字。我搜索的关键字越多,搜索结果就越快。
有进一步优化的方法吗?
<小时/> 更新2013-08-07
阻塞的东西几乎肯定是附加的ORDER BY
语句。没有它,查询将在不到一秒的时间内执行。
那么,有没有办法更快地对结果进行排序?任何建议都欢迎,甚至是需要在其他地方进行后期处理的hackish。
<小时/> 当天晚些时候更新2013-08-07
女士们,先生们,将WHERE
和ORDER BY
语句嵌套在另一层子查询中,不要让它对表格感到困扰,它不需要再次将其性能提高一倍:
SELECT wowrapper.*, locale.title
FROM (
SELECT SQL_NO_CACHE object.object_id, ..., @rn AS numrows
FROM (
SELECT *, @rn := @rn + 1
FROM (
SELECT SQL_NO_CACHE search.media_id, COUNT(search.media_id) AS hits
FROM object_keyword AS kwd
INNER JOIN object_search AS search ON (kwd.keyword_id = search.keyword_id)
WHERE MATCH (kwd.keyword) AGAINST ('+(frau)')
GROUP BY search.media_id HAVING hits = 1
) AS numrowswrapper
CROSS JOIN (SELECT @rn := 0) CONST
) AS search
INNER JOIN object AS object ON (search.object_id = object.object_id)
LEFT JOIN object_color AS color ON (search.object_id = color.object_id)
WHERE 1
ORDER BY object.object_id DESC
) AS wowrapper
LEFT JOIN object_locale AS locale ON (jfwrapper.object_id = locale.object_id)
LIMIT 0,48
花费12秒(单个关键字,约200K结果)的搜索现在占用6,搜索两个占用6秒(60K结果)的关键字现在大约需要3.5秒。
现在这已经是一个巨大的进步,但有没有机会进一步推动这一进程?
<小时/> 当天早些时候更新2013-08-08
Undid查询的最后一个嵌套变体,因为它实际上减慢了它的其他变体......
我现在正在尝试使用MyISAM使用MyISAM的不同表格布局和FULLTEXT
索引的其他一些内容,以获得具有组合关键字字段的专用搜索表(逗号在TEXT
字段中分隔)。
<小时/> 更新2013-08-08
好吧,简单的全文索引并没有真正帮助。
回到之前的设置,唯一阻塞的是ORDER BY
(它使用临时表和filesort)。没有它,搜索在不到一秒的时间内完成!
所以基本上所剩下的就是:
如何通过消除临时表的使用来优化ORDER BY
语句以更快地运行?
答案 0 :(得分:1)
Full text search将比使用标准SQL字符串比较功能快得多。
其次,如果关键字中存在高度冗余,您可以考虑“多对多”实现:
Keywords
--------
keyword_id
keyword
keyword_object
-------------
keyword_id
object_id
objects
-------
object_id
......
如果这将字符串比较从3900万行减少到100K行(大致相当于英文字典的大小),您可能还会看到明显的改进,因为查询只需执行100K字符串比较,并加入整数keyword_id和object_id字段应该比进行39M字符串比较快得多。
答案 1 :(得分:0)
对此最佳解决方案是FULLTEXT搜索,但您可能需要一个MyISAM表。您可以设置镜像表并使用某些事件和触发器进行更新,或者如果您从服务器复制了从属服务器,则可以将其表更改为MyISAM并将其用于搜索。
对于此查询,我唯一能想到的就是将其重写为:
SELECT s1.object_id
FROM object_search s1
JOIN object_search s2 ON s2.object_id = s1.object_id AND s2.key_word = 'word2'
JOIN object_search s3 ON s3.object_id = s1.object_id AND s3.key_word = 'word3'
....
WHERE s1.key_word = 'word1'
我不确定这种方式会更快。
此外,您需要在object_id上有一个索引(假设您的PK为(key_word, object_id)
)。
答案 2 :(得分:0)
如果您很少使用INSERT并经常使用SELECT,则可以针对读取优化数据,即重新计算每个关键字的object_id数量并直接将其存储在数据库中。然后SELECT会非常快,INSERT将需要几秒钟,。