我有3个表 - posts,posts_groups和在帖子和组之间有MANY_2_MANY关系的组。为了获取特定组的所有帖子,我需要加入posts和posts_groups表。现在加入真的很慢。我在这里描述了非常相似的案例MySQL JOIN / IN performance optimization
我认为,为了提高性能,我需要对这种结构进行反规范化。 MySQL的最佳实践是什么?我可以为帖子创建一个新表格,我会为这些帖子参与的群组提供某种哈希值吗?基于此哈希,我将能够通过单一选择查询来自特定组的所有帖子。如果不是,你能否提出最合适的方法来提高这种结构的性能?
已更新
示例查询:
SELECT p.post_id, p.date_created, p.description, p.last_edited,
p.link, p.link_description, p.link_image_url, p.link_title,
p.total_comments, p.total_votes, p.type_id, p.user_id
FROM posts p
JOIN
( SELECT DISTINCT post_id
FROM posts_to_groups
WHERE group_id IN (1, 2, 3, 4, 5)
) AS ptt USING (post_id)
ORDER BY p.last_edited DESC,
p.total_votes DESC
LIMIT 25
此查询仅在非并发环境中快速运行 - ~150ms 。在具有约50个并发用户的性能测试(JMeter)下,它显示 5秒。
创建表格
CREATE TABLE `posts` (
`post_id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` varchar(255) NOT NULL,
`type_id` int(11) NOT NULL,
`description` text,
`link` varchar(1024) DEFAULT NULL,
`date_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`last_edited` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`total_votes` int(11) DEFAULT '0',
`total_comments` int(11) DEFAULT '0',
`link_title` varchar(1024) DEFAULT NULL,
`link_description` varchar(1024) DEFAULT NULL,
`link_image_url` varchar(1024) DEFAULT NULL,
PRIMARY KEY (`post_id`),
KEY `fk_post_type_id` (`type_id`),
FULLTEXT KEY `description` (`description`),
CONSTRAINT `fk_post_type_id` FOREIGN KEY (`type_id`) REFERENCES `post_types` (`post_type_id`)
)
ENGINE=InnoDB AUTO_INCREMENT=109919 DEFAULT CHARSET=utf8
CREATE TABLE `posts_to_groups` (
`group_id` int(11) NOT NULL,
`post_id` int(11) NOT NULL,
PRIMARY KEY (`group_id`,`post_id`),
KEY `post_to_groups_fk_post_id` (`post_id`),
CONSTRAINT `post_to_groups_fk_post_id` FOREIGN KEY (`post_id`) REFERENCES `posts` (`post_id`),
CONSTRAINT `post_to_groups_fk_group_id` FOREIGN KEY (`group_id`) REFERENCES `groups` (`group_id`)
)
ENGINE=InnoDB DEFAULT CHARSET=utf8
CREATE TABLE `groups` (
`group_id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` varchar(255) NOT NULL,
`title` varchar(255) NOT NULL,
`description` text NOT NULL,
`date_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`total_members` int(11) NOT NULL DEFAULT '0',
`total_posts` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`group_id`),
KEY `user_id_idx` (`user_id`),
FULLTEXT KEY `title` (`title`,`description`)
)
ENGINE=InnoDB AUTO_INCREMENT=1288 DEFAULT CHARSET=utf8
答案 0 :(得分:0)
在我看来,你正在进行半连接。通常的方法是使用EXISTS表达式:
SELECT p.post_id, p.date_created, p.description, p.last_edited,
p.link, p.link_description, p.link_image_url, p.link_title,
p.total_comments, p.total_votes, p.type_id, p.user_id
FROM posts p
WHERE EXISTS (
SELECT 1
FROM posts_to_groups
WHERE post_id = p.post_id
AND group_id IN (1, 2, 3, 4, 5)
)
ORDER BY p.last_edited DESC,
p.total_votes DESC
LIMIT 25;
或者,由于只有一个关键字段,您可以尝试使用IN表达式:
SELECT p.post_id, p.date_created, p.description, p.last_edited,
p.link, p.link_description, p.link_image_url, p.link_title,
p.total_comments, p.total_votes, p.type_id, p.user_id
FROM posts p
WHERE post_id IN (
SELECT post_id
FROM posts_to_groups
WHERE group_id IN (1, 2, 3, 4, 5)
)
ORDER BY p.last_edited DESC,
p.total_votes DESC
LIMIT 25;
IN表达式可能会更好,具体取决于您的数据和您正在运行的MySQL版本。旧版本在优化EXISTS方面存在问题。
在这两种情况下,我都希望(posts.post_id)
上有一个索引以及(posts_to_groups.post_id, posts_to_groups.group_id)
上的索引。
第二次尝试:
SELECT DISTINCT p.post_id, p.date_created, p.description, p.last_edited,
p.link, p.link_description, p.link_image_url, p.link_title,
p.total_comments, p.total_votes, p.type_id, p.user_id
FROM posts p
JOIN posts_to_groups pg
ON p.post_id = pg.post_od
WHERE pg.group_id IN (1, 2, 3, 4, 5)
ORDER BY p.last_edited DESC,
p.total_votes DESC
LIMIT 25;