帮助优化查询(显示联系人之间双向关系的强度)

时间:2010-06-30 05:20:12

标签: sql optimization social-networking

我有一个contact_relationship表,用于存储在给定时间点报告的一个联系人与另一个联系人之间关系的强度。

mysql> desc contact_relationship;
+------------------+-----------+------+-----+-------------------+-----------------------------+
| Field            | Type      | Null | Key | Default           | Extra                       |
+------------------+-----------+------+-----+-------------------+-----------------------------+
| relationship_id  | int(11)   | YES  |     | NULL              |                             |
| contact_id       | int(11)   | YES  | MUL | NULL              |                             |
| other_contact_id | int(11)   | YES  |     | NULL              |                             |
| strength         | int(11)   | YES  |     | NULL              |                             |
| recorded         | timestamp | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+------------------+-----------+------+-----+-------------------+-----------------------------+

现在我想得到一个联系人之间的双向关系列表(意思是有两行,一行有联系人a,用联系人b指定关系强度,另一行用联系人b指定联系人a的强度 - 强度双向关系中的两个是这两个强度值中较小的一个。)

这是我提出的查询,但速度很慢:

select 
    mrcr1.contact_id, 
    mrcr1.other_contact_id, 
    case when (mrcr1.strength < mrcr2.strength) then 
        mrcr1.strength 
    else 
        mrcr2.strength 
    end strength 
from ( 
    select 
        cr1.* 
    from ( 
        select 
            contact_id,
            other_contact_id,
            max(recorded) as max_recorded 
        from 
            contact_relationship 
        group by 
            contact_id,
            other_contact_id 
    ) as cr2 
    inner join contact_relationship cr1 on 
        cr1.contact_id = cr2.contact_id 
        and cr1.other_contact_id = cr2.other_contact_id 
        and cr1.recorded = cr2.max_recorded 
) as mrcr1, 
( 
    select 
        cr3.* 
    from ( 
        select 
            contact_id,
            other_contact_id,
            max(recorded) as max_recorded 
        from 
            contact_relationship 
        group by 
            contact_id,
            other_contact_id 
    ) as cr4 
    inner join contact_relationship cr3 on 
        cr3.contact_id = cr4.contact_id 
        and cr3.other_contact_id = cr4.other_contact_id 
        and cr3.recorded = cr4.max_recorded 
) as mrcr2 
where 
    mrcr1.contact_id = mrcr2.other_contact_id 
    and mrcr1.other_contact_id = mrcr2.contact_id 
    and mrcr1.contact_id != mrcr1.other_contact_id 
    and mrcr2.contact_id != mrcr2.other_contact_id 
    and mrcr1.contact_id <= mrcr1.other_contact_id; 

任何人都有任何建议如何加快速度?

请注意,由于用户可能会多次指定与特定用户的关系强度,因此您只能抓取每对联系人的最新记录。

更新:这是解释查询的结果......

+----+-------------+----------------------+-------+----------------------------------------------------------------------------------------+------------------------------+---------+-------------------------------------+-------+--------------------------------+
| id | select_type | table                | type  | possible_keys                                                                          | key                          | key_len | ref                                 | rows  | Extra                          |
+----+-------------+----------------------+-------+----------------------------------------------------------------------------------------+------------------------------+---------+-------------------------------------+-------+--------------------------------+
|  1 | PRIMARY     | <derived2>           | ALL   | NULL                                                                                   | NULL                         | NULL    | NULL                                | 36029 | Using where                    |
|  1 | PRIMARY     | <derived4>           | ALL   | NULL                                                                                   | NULL                         | NULL    | NULL                                | 36029 | Using where; Using join buffer |
|  4 | DERIVED     | <derived5>           | ALL   | NULL                                                                                   | NULL                         | NULL    | NULL                                | 36021 |                                |
|  4 | DERIVED     | cr3                  | ref   | contact_relationship_index_1,contact_relationship_index_2,contact_relationship_index_3 | contact_relationship_index_2 | 10      | cr4.contact_id,cr4.other_contact_id |     1 | Using where                    |
|  5 | DERIVED     | contact_relationship | index | NULL                                                                                   | contact_relationship_index_3 | 14      | NULL                                | 37973 | Using index                    |
|  2 | DERIVED     | <derived3>           | ALL   | NULL                                                                                   | NULL                         | NULL    | NULL                                | 36021 |                                |
|  2 | DERIVED     | cr1                  | ref   | contact_relationship_index_1,contact_relationship_index_2,contact_relationship_index_3 | contact_relationship_index_2 | 10      | cr2.contact_id,cr2.other_contact_id |     1 | Using where                    |
|  3 | DERIVED     | contact_relationship | index | NULL                                                                                   | contact_relationship_index_3 | 14      | NULL                                | 37973 | Using index                    |
+----+-------------+----------------------+-------+----------------------------------------------------------------------------------------+------------------------------+---------+-------------------------------------+-------+--------------------------------+

2 个答案:

答案 0 :(得分:0)

您在选择最新记录时失去了很多时间。 2个选项:

1-更改存储数据的方式,并且只有最近记录的表格,以及更像历史记录的其他表格。

2-如果您的DBMS允许您执行此操作,请使用分析请求选择最新记录。像

这样的东西
Select first_value(strength) over(partition by contact_id, other_contact_id order by recorded desc)
from contact_relationship

一旦你有了良好的记录线,我认为你的查询会更快。

答案 1 :(得分:0)

Scorpi0的回答让我想到也许我可以使用临时表...

create temporary table mrcr1 (
    contact_id int, 
    other_contact_id int, 
    strength int, 
    index mrcr1_index_1 (
        contact_id, 
        other_contact_id
    )
) replace as 
    select 
        cr1.contact_id, 
        cr1.other_contact_id, 
        cr1.strength from ( 
            select 
                contact_id, 
                other_contact_id, 
                max(recorded) as max_recorded 
            from 
                contact_relationship 
            group by 
                contact_id, other_contact_id 
        ) as cr2 
        inner join 
            contact_relationship cr1 on 
                cr1.contact_id = cr2.contact_id 
                and cr1.other_contact_id = cr2.other_contact_id 
                and cr1.recorded = cr2.max_recorded;

我必须做两次(第二次进入名为mrcr2的临时表)因为mysql有一个限制,你不能在一个查询中两次为同一个临时表设置别名。

用我的两个临时表创建了我的查询然后变成:

select 
    mrcr1.contact_id, 
    mrcr1.other_contact_id, 
    case when (mrcr1.strength < mrcr2.strength) then 
        mrcr1.strength 
    else 
        mrcr2.strength 
    end strength 
from 
    mrcr1,
    mrcr2 
where 
    mrcr1.contact_id = mrcr2.other_contact_id 
    and mrcr1.other_contact_id = mrcr2.contact_id 
    and mrcr1.contact_id != mrcr1.other_contact_id 
    and mrcr2.contact_id != mrcr2.other_contact_id 
    and mrcr1.contact_id <= mrcr1.other_contact_id;