此查询需要一分钟才能完成:
SELECT keyword, count(*) as 'Number of Occurences'
FROM movie_keyword
JOIN
keyword
ON keyword.`id` = movie_keyword.`keyword_id`
GROUP BY keyword
ORDER BY count(*) DESC
LIMIT 5
每个关键字都有一个与之关联的ID(keyword_id列)。该ID用于从关键字表中查找实际关键字。
movie_keyword有280万行
关键字有127,000
然而,要返回最常用的keyword_id,只需1秒钟:
SELECT keyword_id, count(*)
FROM movie_keyword
GROUP BY keyword_id
ORDER BY count(*) DESC
LIMIT 5
有更有效的方法吗?
使用EXPLAIN输出:
1 SIMPLE keyword ALL PRIMARY NULL NULL NULL 125405 Using temporary; Using filesort
1 SIMPLE movie_keyword ref idx_keywordid idx_keywordid 4 imdb.keyword.id 28 Using index
结构:
CREATE TABLE `movie_keyword` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`movie_id` int(11) NOT NULL,
`keyword_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_mid` (`movie_id`),
KEY `idx_keywordid` (`keyword_id`),
KEY `keyword_ix` (`keyword_id`),
CONSTRAINT `movie_keyword_keyword_id_exists` FOREIGN KEY (`keyword_id`) REFERENCES `keyword` (`id`),
CONSTRAINT `movie_keyword_movie_id_exists` FOREIGN KEY (`movie_id`) REFERENCES `title` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4256379 DEFAULT CHARSET=latin1;
CREATE TABLE `keyword` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`keyword` text NOT NULL,
`phonetic_code` varchar(5) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_keyword` (`keyword`(5)),
KEY `idx_pcode` (`phonetic_code`),
KEY `keyword_ix` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=127044 DEFAULT CHARSET=latin1;
答案 0 :(得分:1)
未经测试但在我看来应该工作并且明显更快,不太确定你是否允许在mysql的子查询中使用限制,但还有其他方法。
SELECT keyword, count(*) as 'Number of Occurences'
FROM movie_keyword
JOIN
keyword
ON keyword.`id` = movie_keyword.`keyword_id`
WHERE movie_keyword.keyword_id IN (
SELECT keyword_id
FROM movie_keyword
GROUP BY keyword
ORDER BY count(*) DESC
LIMIT 5
)
GROUP BY keyword
ORDER BY count(*) DESC;
这应该更快,因为你没有将movie_keyword中280万个条目与关键字一起加入,只是那些实际匹配的条目,我猜这些条目要少得多。
编辑,因为mysql不支持您必须运行的子查询内的限制
SELECT keyword_id
FROM movie_keyword
GROUP BY keyword
ORDER BY count(*) DESC
LIMIT 5;
首先在获取结果后运行第二个查询
SELECT keyword, count(*) as 'Number of Occurences'
FROM movie_keyword
JOIN
keyword
ON keyword.`id` = movie_keyword.`keyword_id`
WHERE movie_keyword.keyword_id IN (RESULTS_FROM_FIRST_QUERY_SEPARATED_BY_COMMAS)
GROUP BY keyword
ORDER BY count(*) DESC;
使用您正在使用的任何语言以编程方式将RESULTS_FROM_FIRST_QUERY_SEPARATED_BY_COMMAS
替换为正确的值
答案 1 :(得分:0)
查询似乎没问题,但我认为结构不是,试着给列上的索引
keyword.id
试,
CREATE INDEX keyword_ix ON keyword (id);
或
ALTER TABLE keyword ADD INDEX keyword_ix (id);
如果您可以发布表格的结构,那么会好得多:keyword
和Movie_keyword
。哪两个是主表和引用表?
SELECT keyword, count(movie_keyword.id) as 'Number of Occurences'
FROM movie_keyword
INNER JOIN keyword
ON keyword.`id` = movie_keyword.`keyword_id`
GROUP BY keyword
ORDER BY 'Number of Occurences' DESC
LIMIT 5
答案 2 :(得分:0)
我知道这是一个非常古老的问题,但是因为我认为xception忘记了mysql中的交付表,我想提出另一个解决方案。它只需要一个查询,它省略了加入大数据。如果有人拥有如此大的数据并且可以测试它(可能是问题创建者),请分享结果。
SELECT keyword.keyword, _temp.occurences
FROM (
SELECT keyword_id, COUNT( keyword_id ) AS occurences
FROM movie_keyword
GROUP BY keyword_id
ORDER BY occurences DESC
LIMIT 5
) AS _temp
JOIN keyword ON _temp.keyword_id = keyword.id
ORDER BY _temp.occurences DESC