我有以下数据库架构
keywords(id, keyword, lang) :( about 8M records)
topics(id, topic, lang) : ( about 2.6M records)
topic_keywords(topic_id, keyword_id, weight) : (200M records)
在脚本中,我有大约50-100个关键字,其他字段为keyword_score
,我想根据以下公式检索与这些关键字对应的前20个主题:SUM(keyword_score * topic_weight)
我目前在我的脚本中实现的解决方案是:
temporary_keywords(keyword_id, keyword_score )
keyword_score
然后执行以下查询以检索主题
SELECT topic_id, SUM(weight * keyword_score) AS score
FROM temporary_keywords
JOIN topic_keywords USING keyword_id
GROUP BY topic_id
ORDER BY score DESC
LIMIT 20
此解决方案有效,但在某些情况下执行时间最长可达3秒,这对我来说太过分了。
我问是否有办法优化此查询?或者我应该将数据结构重新设计到NoSQL数据库中吗?
除了上面列出的内容之外的任何其他解决方案或想法都非常感谢
更新(显示创建表格)
CREATE TABLE `topic_keywords` (
`topic_id` int(11) NOT NULL,
`keyword_id` int(11) NOT NULL,
`weight` float DEFAULT '0',
PRIMARY KEY (`topic_id`,`keyword_id`),
KEY `keyword_id_idx` (`keyword_id`,`topic_id`,`weight`)
)
CREATE TEMPORARY TABLE temporary_keywords
( keyword_id INT PRIMARY KEY NOT NULL,
keyword_score DOUBLE
)
EXPLAIN QUERY
+----+-------------+--------------------+------+----------------------+----------------------+---------+--------------------------------------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------+------+----------------------+----------------------+---------+--------------------------------------+----------+---------------------------------+
| 1 | SIMPLE | temporary_keywords | ALL | PRIMARY | NULL | NULL | NULL | 100 | Using temporary; Using filesort |
| 1 | SIMPLE | topic_keywords | ref | keyword_id_idx | keyword_id_idx | 4 | topics.temporary_keywords.keyword_id | 10778853 | Using index |
+----+-------------+--------------------+------+----------------------+----------------------+---------+--------------------------------------+----------+---------------------------------+
答案 0 :(得分:0)
语法错误但未被捕获。
JOIN topic_keywords USING keyword_id
- >
JOIN topic_keywords USING(keyword_id)
如果仍无法解决问题,请提供EXPLAIN FORMAT=JSON SELECT ...