我有2个大型mysql表:文章和ArticleTopics。我想查询数据库并检索为给定的topicID发布的最后30篇文章。我当前的查询相当慢。关于如何改进它的任何想法?
表格:
Articles (~1 million rows)
+-----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+----------------+
| articleId | int(11) | NO | PRI | NULL | auto_increment |
| title | varchar(255) | NO | | NULL | |
| content | longtext | NO | | NULL | |
| pubDate | datetime | NO | MUL | NULL | |
+-----------+--------------+------+-----+---------+----------------+
ArticleTopics (~10 million rows)
+-----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| articleId | int(11) | NO | MUL | NULL | |
| topicId | int(11) | NO | MUL | NULL | |
+-----------+--------------+------+-----+---------+-------+
我的疑问:
SELECT a.articleId, a.pubDate
FROM Articles a, ArticleTopics t
WHERE t.articleId=a.articleId AND t.topicId=3364
ORDER BY a.pubDate DESC LIMIT 30;
查询的EXPLAIN:
+----+-------------+-------+--------+-------------------------------------+-------------------+---------+-------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-------------------------------------+-------------------+---------+-------------------+------+----------------------------------------------+
| 1 | SIMPLE | t | ref | articleId,topicId,topicId_articleId | topicId_articleId | 4 | const | 4281 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | a | eq_ref | PRIMARY,articleId_pubDate | PRIMARY | 4 | t.articleId | 1 | |
+----+-------------+-------+--------+-------------------------------------+-------------------+---------+-------------------+------+----------------------------------------------+
我认为,缓慢来自ORDER BY a.pubDate DESC
。我可以通过伪造一点ORDER BY t.articleId DESC
并在ArticleTopics
和articleId
上使用topicId
的索引来改善性能。 {{1}},因为一般来说,articleIds与pubDates的顺序相同。然而,它们并不总是如此,所以它并不理想。我希望能够在pubDate上对它进行排序。
更新:添加了EXPLAIN。
答案 0 :(得分:1)
您可以通过各种方式重写查询,看看它是否加快了速度:
SELECT a.articleId, a.pubDate
FROM Articles a
WHERE a.articleId in (
select articleId
from ArticleTopics
where topicId = 3364
)
ORDER BY a.pubDate DESC LIMIT 30;
或者:
SELECT a.articleId, a.pubDate
FROM Articles a
INNER JOIN ArticleTopics t ON t.articleId = a.articleId
WHERE t.topicId = 3364
ORDER BY a.pubDate DESC LIMIT 30;
两个查询的重要索引都在文章上,并包含articleId作为第一个字段。
如果article是一个大表,用二进制表示整个PDF,则可以创建一个完全覆盖查询的索引。完全覆盖意味着所有选定的字段都是索引的一部分。对于此查询,完全覆盖索引将是(articleId,pubDate)。
答案 1 :(得分:0)
此时,您在topicId
上有索引吗?如果是,索引是否仅包含topicId
字段?
也许您可以发布EXPLAIN
查询的输出。