使用SUM,Group By和ORDER By子句优化连接查询

时间:2017-02-13 17:38:32

标签: mysql query-optimization

我有以下数据库架构

keywords(id, keyword, lang) :( about 8M records)
topics(id, topic, lang) : ( about 2.6M records)
topic_keywords(topic_id, keyword_id, weight) : (200M records)

在脚本中,我有大约50-100个关键字,其他字段为keyword_score,我想根据以下公式检索与这些关键字对应的前20个主题:SUM(keyword_score * topic_weight)

我目前在我的脚本中实现的解决方案是:

  • 我创建了一个临时表,如下所示temporary_keywords(keyword_id, keyword_score )
  • 使用keyword_score
  • 将所有50-100个关键字插入其中
  • 然后执行以下查询以检索主题

    SELECT topic_id,  SUM(weight * keyword_score) AS score
    FROM temporary_keywords
    JOIN topic_keywords USING keyword_id
    GROUP BY topic_id
    ORDER BY score DESC
    LIMIT 20
    

此解决方案有效,但在某些情况下执行时间最长可达3秒,这对我来说太过分了。

我问是否有办法优化此查询?或者我应该将数据结构重新设计到NoSQL数据库中吗?

除了上面列出的内容之外的任何其他解决方案或想法都非常感谢

更新(显示创建表格)

CREATE TABLE `topic_keywords` (
  `topic_id` int(11) NOT NULL,
  `keyword_id` int(11) NOT NULL,
  `weight` float DEFAULT '0',
  PRIMARY KEY (`topic_id`,`keyword_id`),
  KEY `keyword_id_idx` (`keyword_id`,`topic_id`,`weight`)
)

CREATE TEMPORARY TABLE temporary_keywords 
(   keyword_id INT PRIMARY KEY NOT NULL,
    keyword_score  DOUBLE 
)

EXPLAIN QUERY

+----+-------------+--------------------+------+----------------------+----------------------+---------+--------------------------------------+----------+---------------------------------+
| id | select_type | table              | type | possible_keys        | key                  | key_len | ref                                  | rows     | Extra                           |
+----+-------------+--------------------+------+----------------------+----------------------+---------+--------------------------------------+----------+---------------------------------+
|  1 | SIMPLE      | temporary_keywords | ALL  | PRIMARY              | NULL                 | NULL    | NULL                                 |      100 | Using temporary; Using filesort |
|  1 | SIMPLE      | topic_keywords     | ref  | keyword_id_idx       | keyword_id_idx       | 4       | topics.temporary_keywords.keyword_id | 10778853 | Using index                     |
+----+-------------+--------------------+------+----------------------+----------------------+---------+--------------------------------------+----------+---------------------------------+

1 个答案:

答案 0 :(得分:0)

语法错误但未被捕获。

JOIN topic_keywords USING keyword_id

- >

JOIN topic_keywords USING(keyword_id)

如果仍无法解决问题,请提供EXPLAIN FORMAT=JSON SELECT ...