Mysql使用many_2_many

时间:2015-06-19 15:35:01

标签: mysql sql performance

我有3个表 - posts,posts_groups和在帖子和组之间有MANY_2_MANY关系的组。为了获取特定组的所有帖子,我需要加入posts和posts_groups表。现在加入真的很慢。我在这里描述了非常相似的案例MySQL JOIN / IN performance optimization

我认为,为了提高性能,我需要对这种结构进行反规范化。 MySQL的最佳实践是什么?我可以为帖子创建一个新表格,我会为这些帖子参与的群组提供某种哈希值吗?基于此哈希,我将能够通过单一选择查询来自特定组的所有帖子。如果不是,你能否提出最合适的方法来提高这种结构的性能?

已更新

示例查询:

SELECT  p.post_id, p.date_created, p.description, p.last_edited,
        p.link, p.link_description, p.link_image_url, p.link_title,
        p.total_comments, p.total_votes, p.type_id, p.user_id
    FROM  posts p
    JOIN  
      ( SELECT  DISTINCT  post_id
            FROM  posts_to_groups
            WHERE  group_id IN (1, 2, 3, 4, 5)
      ) AS ptt USING (post_id)
    ORDER BY  p.last_edited DESC,
              p.total_votes DESC
    LIMIT  25

此查询仅在非并发环境中快速运行 - ~150ms 。在具有约50个并发用户的性能测试(JMeter)下,它显示 5秒

创建表格

CREATE TABLE `posts` (
    `post_id` int(11) NOT NULL AUTO_INCREMENT,
    `user_id` varchar(255) NOT NULL,
    `type_id` int(11) NOT NULL,
    `description` text,
    `link` varchar(1024) DEFAULT NULL,
    `date_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
    `last_edited` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
    `total_votes` int(11) DEFAULT '0',
    `total_comments` int(11) DEFAULT '0',
    `link_title` varchar(1024) DEFAULT NULL,
    `link_description` varchar(1024) DEFAULT NULL,
    `link_image_url` varchar(1024) DEFAULT NULL,

    PRIMARY KEY (`post_id`),
    KEY `fk_post_type_id` (`type_id`),
    FULLTEXT KEY `description` (`description`),
    CONSTRAINT `fk_post_type_id` FOREIGN KEY (`type_id`) REFERENCES `post_types` (`post_type_id`)
) 
ENGINE=InnoDB AUTO_INCREMENT=109919 DEFAULT CHARSET=utf8

CREATE TABLE `posts_to_groups` (
    `group_id` int(11) NOT NULL,
    `post_id` int(11) NOT NULL,

    PRIMARY KEY (`group_id`,`post_id`),
    KEY `post_to_groups_fk_post_id` (`post_id`),
    CONSTRAINT `post_to_groups_fk_post_id` FOREIGN KEY (`post_id`) REFERENCES `posts` (`post_id`),
    CONSTRAINT `post_to_groups_fk_group_id` FOREIGN KEY (`group_id`) REFERENCES `groups` (`group_id`)
) 
ENGINE=InnoDB DEFAULT CHARSET=utf8

CREATE TABLE `groups` (
    `group_id` int(11) NOT NULL AUTO_INCREMENT,
    `user_id` varchar(255) NOT NULL,
    `title` varchar(255) NOT NULL,
    `description` text NOT NULL,
    `date_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
    `total_members` int(11) NOT NULL DEFAULT '0',
    `total_posts` int(11) NOT NULL DEFAULT '0',

    PRIMARY KEY (`group_id`),
    KEY `user_id_idx` (`user_id`),
    FULLTEXT KEY `title` (`title`,`description`)
) 
ENGINE=InnoDB AUTO_INCREMENT=1288 DEFAULT CHARSET=utf8

1 个答案:

答案 0 :(得分:0)

在我看来,你正在进行半连接。通常的方法是使用EXISTS表达式:

SELECT  p.post_id, p.date_created, p.description, p.last_edited,
        p.link, p.link_description, p.link_image_url, p.link_title,
        p.total_comments, p.total_votes, p.type_id, p.user_id
FROM  posts p
WHERE EXISTS (
        SELECT 1
        FROM posts_to_groups
        WHERE post_id = p.post_id
            AND group_id IN (1, 2, 3, 4, 5)
    )
ORDER BY  p.last_edited DESC,
          p.total_votes DESC
LIMIT  25;

或者,由于只有一个关键字段,您可以尝试使用IN表达式:

SELECT  p.post_id, p.date_created, p.description, p.last_edited,
        p.link, p.link_description, p.link_image_url, p.link_title,
        p.total_comments, p.total_votes, p.type_id, p.user_id
FROM  posts p
WHERE post_id IN (
        SELECT post_id
        FROM posts_to_groups
        WHERE group_id IN (1, 2, 3, 4, 5)
    )
ORDER BY  p.last_edited DESC,
          p.total_votes DESC
LIMIT  25;

IN表达式可能会更好,具体取决于您的数据和您正在运行的MySQL版本。旧版本在优化EXISTS方面存在问题。

在这两种情况下,我都希望(posts.post_id)上有一个索引以及(posts_to_groups.post_id, posts_to_groups.group_id)上的索引。

第二次尝试:

SELECT  DISTINCT p.post_id, p.date_created, p.description, p.last_edited,
        p.link, p.link_description, p.link_image_url, p.link_title,
        p.total_comments, p.total_votes, p.type_id, p.user_id
FROM  posts p
JOIN  posts_to_groups pg
      ON p.post_id = pg.post_od
WHERE pg.group_id IN (1, 2, 3, 4, 5)
ORDER BY  p.last_edited DESC,
          p.total_votes DESC
LIMIT  25;