MySQL包含/排除帖子

时间:2010-10-11 05:40:37

标签: php mysql optimization doctrine

这篇文章需要花费大量时间才能打字,因为我想尽可能清楚,所以如果还不清楚的话请耐心等待。

基本上,我所拥有的是数据库中的帖子表,用户可以添加隐私设置。

ID | owner_id | post | other_info | privacy_level (int value)

从那里,用户可以添加他们的隐私详细信息,允许所有[privacy_level = 0),朋友(privacy_level = 1),没有人(privacy_level = 3)或特定人员或过滤器(privacy_level = 4)查看)。对于指定特定人员(4)的隐私级别,查询将在子查询中引用表“post_privacy_includes_for”,以查看用户(或用户所属的过滤器)是否存在于表中的一行中。

ID | post_id | user_id | list_id

此外,用户能够通过排除它们来阻止某些人在更大的组内查看他们的帖子(例如,将其设置为供所有人查看但是将其隐藏于跟踪者用户)。为此,添加了另一个参考表“post_privacy_exclude_from” - 它看起来与“post_privacy_includes_for”的设置相同。

我的问题是这不会扩展。完全没有。目前,大约有1-2百万个帖子,其中大多数都可供所有人查看。对于页面上的每个帖子,它必须检查是否有一行不包括显示给用户的帖子 - 这在一个可填充100-200个帖子的页面上移动得非常慢。最长可能需要2-4秒,尤其是在查询中添加了其他约束时。

这也会产生极其庞大而复杂的查询,这些查询只是......尴尬。

SELECT t.*
FROM posts t
WHERE ( (t.privacy_level = 3
         AND t.owner_id = ?)
       OR (t.privacy_level = 4
           AND EXISTS
             ( SELECT i.id
              FROM PostPrivacyIncludeFor i
              WHERE i.user_id = ?
                AND i.thought_id = t.id)
           OR t.privacy_level = 4
           AND t.owner_id = ?)
       OR (t.privacy_level = 4
           AND EXISTS
             (SELECT i2.id
              FROM PostPrivacyIncludeFor i2
              WHERE i2.thought_id = t.id
                AND EXISTS
                  (SELECT r.id
                   FROM FriendFilterIds r
                   WHERE r.list_id = i2.list_id
                     AND r.friend_id = ?))
           OR t.privacy_level = 4
           AND t.owner_id = ?)
       OR (t.privacy_level = 1
           AND EXISTS
             (SELECT G.id
              FROM Following G
              WHERE follower_id = t.owner_id
                AND following_id = ?
                AND friend = 1)
           OR t.privacy_level = 1
           AND t.owner_id = ?)
       OR (NOT EXISTS
             (SELECT e.id
              FROM PostPrivacyExcludeFrom e
              WHERE e.thought_id = t.id
                AND e.user_id = ?
                AND NOT EXISTS
                  (SELECT e2.id
                   FROM PostPrivacyExcludeFrom e2
                   WHERE e2.thought_id = t.id
                     AND EXISTS
                       (SELECT l.id
                        FROM FriendFilterIds l
                        WHERE l.list_id = e2.list_id
                          AND l.friend_id = ?)))
           AND t.privacy_level IN (0, 1, 4))
  AND t.owner_id = ?
ORDER BY t.created_at LIMIT 100

(模拟查询,类似于我在Doctrine ORM中使用的查询。这是一团糟,但你得到了我所说的。)

我想我的问题是,你会如何处理这种情况来优化它?有没有更好的方法来设置我的数据库?我愿意完全废弃我目前建立的方法,但我不知道该怎么做。

谢谢你们。

更新:修复查询以反映我为上面的隐私级别定义的值(我忘了更新它,因为我简化了值)

2 个答案:

答案 0 :(得分:1)

您的查询太长,无法给出明确的解决方案,但我要遵循的方法是通过将子查询转换为连接来简单地进行数据查找,然后将逻辑构建到where子句和列列表中。 select语句:

select t.*, i.*, r.*, G.*, e.* from posts t
left join PostPrivacyIncludeFor i on i.user_id = ? and i.thought_id = t.id
left join FriendFilterIds r on r.list_id = i.list_id and r.friend_id = ?
left join Following G on follower_id = t.owner_id and G.following_id = ? and G.friend=1
left join PostPrivacyExcludeFrom e on e.thought_id = t.id and e.user_id = ? 

(这可能需要扩展:我无法遵循最终条款的逻辑。)

如果您可以快速完成简单选择并包含所需的所有信息,那么您需要做的就是在选择列表和where子句中构建逻辑。

答案 1 :(得分:0)

在没有过多地重新设计原始设计的情况下快速简化了这一过程。

使用此解决方案,您的网页现在只需调用以下存储过程即可获取指定时间段内给定用户的已过滤帖子列表。

call list_user_filtered_posts( <user_id>, <day_interval> );

可以在此处找到整个脚本:http://pastie.org/1212812

我还没有对所有这些进行全面测试,您可能会发现此解决方案的性能不足以满足您的需求,但它可以帮助您微调/修改现有设计。

<强>表格

删除了post_privacy_exclude_from表,并添加了一个user_stalkers表,其工作方式与user_friends的反向非常相似。根据您的设计保留原始post_privacy_includes_for表,因为这允许用户将特定帖子限制为一部分人。

drop table if exists users;
create table users
(
user_id int unsigned not null auto_increment primary key,
username varbinary(32) unique not null
)
engine=innodb;


drop table if exists user_friends;
create table user_friends
(
user_id int unsigned not null,
friend_user_id int unsigned not null,
primary key (user_id, friend_user_id)
)
engine=innodb;


drop table if exists user_stalkers;
create table user_stalkers
(
user_id int unsigned not null,
stalker_user_id int unsigned not null,
primary key (user_id, stalker_user_id)
)
engine=innodb;


drop table if exists posts;
create table posts
(
post_id int unsigned not null auto_increment primary key,
user_id int unsigned not null,
privacy_level tinyint unsigned not null default 0,
post_date datetime not null,
key user_idx(user_id),
key post_date_user_idx(post_date, user_id)
)
engine=innodb;


drop table if exists post_privacy_includes_for;
create table post_privacy_includes_for
(
post_id int unsigned not null,
user_id int unsigned not null,
primary key (post_id, user_id)
)
engine=innodb;

存储过程

存储过程相对简单 - 它最初在指定时间段内选择所有帖子,然后根据您的原始要求过滤掉帖子。我没有对大容量的这个sproc进行性能测试,但由于初始选择相对较小,它应该足够高性能以及简化应用程序/中间层代码。

drop procedure if exists list_user_filtered_posts;

delimiter #

create procedure list_user_filtered_posts
(
in p_user_id int unsigned,
in p_day_interval tinyint unsigned
)
proc_main:begin

 drop temporary table if exists tmp_posts;
 drop temporary table if exists tmp_priv_posts;

 -- select ALL posts in the required date range (or whatever selection criteria you require)

 create temporary table tmp_posts engine=memory 
 select 
  p.post_id, p.user_id, p.privacy_level, 0 as deleted 
 from 
  posts p
 where
  p.post_date between now() - interval p_day_interval day and now()  
 order by 
  p.user_id;

 -- purge stalker posts (0,1,3,4)

 update tmp_posts 
 inner join user_stalkers us on us.user_id = tmp_posts.user_id and us.stalker_user_id = p_user_id
 set
  tmp_posts.deleted = 1
 where
  tmp_posts.user_id != p_user_id;

 -- purge other users private posts (3)

 update tmp_posts set deleted = 1 where user_id != p_user_id and privacy_level = 3;

 -- purge friend only posts (1) i.e where p_user_id is not a friend of the poster

 /*
  requires another temp table due to mysql temp table problem/bug
  http://dev.mysql.com/doc/refman/5.0/en/temporary-table-problems.html
 */

 -- the private posts (1) this user can see

 create temporary table tmp_priv_posts engine=memory 
 select
  tp.post_id
 from
  tmp_posts tp
 inner join user_friends uf on uf.user_id = tp.user_id and uf.friend_user_id = p_user_id
 where
  tp.user_id != p_user_id and tp.privacy_level = 1;

 -- remove private posts this user cant see

 update tmp_posts 
 left outer join tmp_priv_posts tpp on tmp_posts.post_id = tpp.post_id 
 set 
  tmp_posts.deleted = 1
 where 
  tpp.post_id is null and tmp_posts.privacy_level = 1;

 -- purge filtered (4)

 truncate table tmp_priv_posts; -- reuse tmp table

 insert into tmp_priv_posts
 select
  tp.post_id
 from
  tmp_posts tp
 inner join post_privacy_includes_for ppif on tp.post_id = ppif.post_id and ppif.user_id = p_user_id
 where
  tp.user_id != p_user_id and tp.privacy_level = 4;

 -- remove private posts this user cant see

 update tmp_posts 
 left outer join tmp_priv_posts tpp on tmp_posts.post_id = tpp.post_id 
 set 
  tmp_posts.deleted = 1
 where 
  tpp.post_id is null and tmp_posts.privacy_level = 4;

 drop temporary table if exists tmp_priv_posts;

 -- output filtered posts (display ALL of these on web page)

 select 
  p.* 
 from 
  posts p
 inner join tmp_posts tp on p.post_id = tp.post_id
 where
  tp.deleted = 0
 order by
  p.post_id desc;

 -- clean up

 drop temporary table if exists tmp_posts;

end proc_main #

delimiter ;

测试数据

一些基本的测试数据。

insert into users (username) values ('f00'),('bar'),('alpha'),('beta'),('gamma'),('omega');

insert into user_friends values 
(1,2),(1,3),(1,5),
(2,1),(2,3),(2,4),
(3,1),(3,2),
(4,5),
(5,1),(5,4);

insert into user_stalkers values (4,1);

insert into posts (user_id, privacy_level, post_date) values

-- public (0)

(1,0,now() - interval 8 day),
(1,0,now() - interval 8 day),
(2,0,now() - interval 7 day),
(2,0,now() - interval 7 day),
(3,0,now() - interval 6 day),
(4,0,now() - interval 6 day),
(5,0,now() - interval 5 day),

-- friends only (1)

(1,1,now() - interval 5 day),
(2,1,now() - interval 4 day),
(4,1,now() - interval 4 day),
(5,1,now() - interval 3 day),

-- private (3)

(1,3,now() - interval 3 day),
(2,3,now() - interval 2 day),
(4,3,now() - interval 2 day),

-- filtered (4)

(1,4,now() - interval 1 day),
(4,4,now() - interval 1 day),
(5,4,now());

insert into post_privacy_includes_for values (15,4), (16,1), (17,6);

<强>测试

正如我之前提到的,我没有对此进行全面测试,但从表面上看它似乎正在起作用。

select * from posts;

call list_user_filtered_posts(1,14);
call list_user_filtered_posts(6,14);

call list_user_filtered_posts(1,7);
call list_user_filtered_posts(6,7);

希望你能找到一些有用的东西。