MySQL:建议对象(优化多连接查询)

时间:2009-10-13 14:25:54

标签: sql mysql performance optimization

目标:根据用户的选择建议对象

数据:包含有关用户如何订购最差到最佳对象子集的信息的表格;例如:

          1 2 3 4 5 6
    John: A B G J S O
    Mary: A C G L
    Joan: B C L J K
    Stan: G J C L

用户的用户数约为20倍,每个用户的阵容包含50-200个对象。

表格:

CREATE TABLE IF NOT EXISTS `pref` (
  `usr` int(10) unsigned NOT NULL,
  `obj` int(10) unsigned NOT NULL,
  `ord` int(10) unsigned NOT NULL,
  UNIQUE KEY `u_o` (`usr`,`obj`),
  KEY `u` (`usr`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

基本思路:从第二个最差的构建对开始在用户的对象中迭代(A> B);根据这些用户,在其他用户的阵容中查找它们并列出比A更好的项目。

查询:

SELECT e.obj, COUNT(e.obj) AS rate
FROM pref a, pref b, pref c, pref d, pref e

WHERE a.usr = '222' # step 1: select a pair of objects A, B, where A is better than B according to user X
AND a.obj = '111'
AND b.usr = a.usr
AND b.ord < a.ord

AND c.obj = a.obj # step 2: find users thinking that object A is better than B
AND d.obj = b.obj
AND d.ord < c.ord
AND d.usr = c.usr 

AND e.ord > c.ord # step 3: find objects better than A according to these users
AND e.usr = c.usr

GROUP BY e.obj
ORDER BY rate DESC;

别名:
a对象A('111'),当前用户('222')
b对象B,根据当前用户比A更差('ord'的值低于A) c对象A在其他用户的阵容中 d对象B在其他用户的阵容中 在其他用户的阵容中,e对象优于A

执行计划(ouo和uo是 Quassnoi 建议的索引):

+----+-------------+-------+------+---------------+------+---------+---------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref                 | rows | Extra                                        |
+----+-------------+-------+------+---------------+------+---------+---------------------+------+----------------------------------------------+
|  1 | SIMPLE      | a     | ref  | ouo,uo        | ouo  | 8       | const,const         |    1 | Using index; Using temporary; Using filesort | 
|  1 | SIMPLE      | b     | ref  | ouo,uo        | uo   | 4       | const               |   86 | Using where                                  | 
|  1 | SIMPLE      | d     | ref  | ouo,uo        | ouo  | 4       | db.b.obj            |  587 | Using index                                  | 
|  1 | SIMPLE      | c     | ref  | ouo,uo        | ouo  | 8       | const,db.d.usr      |    1 | Using where; Using index                     | 
|  1 | SIMPLE      | e     | ref  | uo            | uo   | 4       | db.d.usr            |   80 | Using where                                  | 
+----+-------------+-------+------+---------------+------+---------+---------------------+------+----------------------------------------------+

只要数据集不是太大,查询似乎就可以正常工作;关于如何简化它以支持更大的数据集的任何想法?

1 个答案:

答案 0 :(得分:3)

查询没问题,只需创建以下索引:

pref (obj, usr, ord)
pref (usr, ord)

<强>更新

试试这种语法。

评级系统更简单但非常相似:它对我创建的测试随机结果给出几乎相同的评级。

SELECT  oa.obj, SUM(weight) AS rate
FROM    (
        SELECT  usr, ord,
                (
                SELECT  COUNT(*)
                FROM    pref a
                JOIN    pref ob
                ON      ob.obj = a.obj
                WHERE   ob.usr = o.usr
                        AND a.usr = 50
                        AND a.ord <
                        (
                        SELECT  ord
                        FROM    pref ai
                        WHERE   ai.usr = 50
                                AND ai.obj = 75
                        )
                        AND ob.ord < o.ord
                ) AS weight
        FROM    pref o
        WHERE   o.obj = 75
        HAVING  weight >= 0
        ) ow
JOIN    pref oa
ON      oa.usr = ow.usr
        AND oa.ord > ow.ord
GROUP BY
        oa.obj
ORDER BY
        rate DESC

此查询为所有评分为A的用户评分高于A的每个项目的权重。

权重等于两位用户评定为A以下的项目数。