社交朋友图的MySQL优化(按朋友分组)

时间:2012-03-27 13:16:04

标签: mysql optimization

对于查询和扩展该查询以查找具有大量朋友的用户的性能,我遇到了一些问题。查询的目标是获取您朋友在过去30天内执行的顶级“活动”。以下是我的查询:

SELECT a.activity_id, b.activity_name, count(a.activity_id) as total_count
FROM friends as f
INNER JOIN activities as a on (a.user_id = f.friend_id 
and a.created_at >= DATE_SUB(NOW(), INTERVAL 30 DAY)
INNER JOIN activity as b on a.activity_id = b.activity_id
WHERE f.user_id = 1 and f.is_approved = 1
GROUP by a.activity_id
ORDER by total_count DESC
LIMIT 5

无论朋友图表有多大或多小,此查询都需要25秒才能为所有用户运行。索引如下:

Table: activities
PRIMARY: [act_id] Other: [activity_id, user_id], [user_id, created_at], [created_at]

Table: friends
PRIMARY: [user_id, friend_id] Other: [user_id, is_approved], [friend_id]

Table: activity:
PRIMARY: [activity_id]

非常感谢任何帮助。

更新:这是解释

id  select_type     table   key             key_len         ref             rows    Extra
1   SIMPLE   F  ref     friend_lookup   5   const,const     795     Using temporary; Using filesort
1   SIMPLE A    ref     user_id         4   F.friend_id     58      Using where
1   SIMPLE    B     eq_ref  PRIMARY         4   P.activty_id    1       Using where

3 个答案:

答案 0 :(得分:2)

Robin在日期字段中是正确的。如果您正在使用某个函数,则必须计算其扫描的条目数。我在下面的方式使用MySQL变量。我将它计算为一个@StartDate并使用THAT值作为join子句。

我改变的唯一附加内容是添加“STRAIGHT_JOIN”子句。在许多情况下,我发现它帮助我和其他人优化查询。它阻止MySQL尝试以另一种方式解释查询,因为它可能首先查看Activity表,因为它是一个较小的文件,然后从那个文件反向链接。 “STRAIGHT_JOIN”告诉优化器按照您列出的顺序执行此操作。

SELECT STRAIGHT_JOIN
      a.activity_id, 
      b.activity_name, 
      count(a.activity_id) as total_count
   FROM 
      ( select @StartDate := date_Sub( now(), interval 30 day ) sqlvars,
      friends as f
         INNER JOIN activities as a 
            on a.user_id = f.friend_id 
           and a.created_at >= @StartDate
         INNER JOIN activity as b 
            on a.activity_id = b.activity_id
   WHERE 
          f.user_id = 1 
      and f.is_approved = 1
   GROUP by 
      a.activity_id
   ORDER by 
      total_count DESC
   LIMIT 5

每次反馈

既然如此,并且有了这个“滚动30天前”的循环,我就会求助于夜间表创建,这只不过是用户ID,活动和计数以及查询的创建而已...... / p>

create table DailyRollupActivity
select a.user_id,
       a.activity_id,
       count(*) total_count
   from
      ( select @StartDate := date_Sub( now(), interval 30 day ) sqlvars,
      Activities a
   where
      a.created_at >= @StartDate
   group by
      a.User_ID,
      a.Activity_ID

确保您通过(用户ID和总计数)在此每日聚合表上有一个索引,然后根据按total_count降序和限制5排序的朋友ID直接查询。这需要支付小的价格以获得每晚触发/要运行的事件/脚本来创建此ONCE。查看当前日期的活动有多重要。一天活动会激烈的活动是否会扭曲您想要呈现给用户的内容?

答案 1 :(得分:0)

似乎这是一个非规范化的时间。

如果你只存储一个分离度,这很容易。在活动发生时记录每个朋友的“朋友活动”。它会将负载分配给执行活动的人员的请求。

记住这一点 - 在活动发生后,它无法“取消发生”(尽管您可能会从Feed中删除它的记录)。这允许您为了性能而采用更具事务性的日志记录方法。

答案 2 :(得分:0)

尝试将查询更改为此时开始:

$str_date = date('Y-m-d H:i:s', strtotime('today -30 Days'));

SELECT a.activity_id, b.activity_name, count(a.activity_id) as total_count
FROM (  SELECT friend_id
        FROM friends
        WHERE user_id = 1 and is_approved = 1) as f
INNER JOIN (    SELECT user_id, activity_id
                FROM activities
                WHERE created_at >= {$str_date}) as a
on a.user_id = f.friend_id 
INNER JOIN activity as b on a.activity_id = b.activity_id
GROUP by a.activity_id
ORDER by total_count DESC
LIMIT 5

基本上,它会在加入其他表之前过滤user_id和is_approved。最好用PHP(或任何语言)生成日期,然后在MySQL中使用该值,然后让MySQL计算完全相同的事情(可能数千次)。