目标:根据用户的选择建议对象
数据:包含有关用户如何订购最差到最佳对象子集的信息的表格;例如:
1 2 3 4 5 6
John: A B G J S O
Mary: A C G L
Joan: B C L J K
Stan: G J C L
用户的用户数约为20倍,每个用户的阵容包含50-200个对象。
表格:
CREATE TABLE IF NOT EXISTS `pref` (
`usr` int(10) unsigned NOT NULL,
`obj` int(10) unsigned NOT NULL,
`ord` int(10) unsigned NOT NULL,
UNIQUE KEY `u_o` (`usr`,`obj`),
KEY `u` (`usr`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
基本思路:从第二个最差的构建对开始在用户的对象中迭代(A> B);根据这些用户,在其他用户的阵容中查找它们并列出比A更好的项目。
查询:
SELECT e.obj, COUNT(e.obj) AS rate
FROM pref a, pref b, pref c, pref d, pref e
WHERE a.usr = '222' # step 1: select a pair of objects A, B, where A is better than B according to user X
AND a.obj = '111'
AND b.usr = a.usr
AND b.ord < a.ord
AND c.obj = a.obj # step 2: find users thinking that object A is better than B
AND d.obj = b.obj
AND d.ord < c.ord
AND d.usr = c.usr
AND e.ord > c.ord # step 3: find objects better than A according to these users
AND e.usr = c.usr
GROUP BY e.obj
ORDER BY rate DESC;
别名:
a
对象A('111'),当前用户('222')
b
对象B,根据当前用户比A更差('ord'的值低于A)
c
对象A在其他用户的阵容中
d
对象B在其他用户的阵容中
在其他用户的阵容中,e
对象优于A
执行计划(ouo和uo是 Quassnoi 建议的索引):
+----+-------------+-------+------+---------------+------+---------+---------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+---------------------+------+----------------------------------------------+
| 1 | SIMPLE | a | ref | ouo,uo | ouo | 8 | const,const | 1 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | b | ref | ouo,uo | uo | 4 | const | 86 | Using where |
| 1 | SIMPLE | d | ref | ouo,uo | ouo | 4 | db.b.obj | 587 | Using index |
| 1 | SIMPLE | c | ref | ouo,uo | ouo | 8 | const,db.d.usr | 1 | Using where; Using index |
| 1 | SIMPLE | e | ref | uo | uo | 4 | db.d.usr | 80 | Using where |
+----+-------------+-------+------+---------------+------+---------+---------------------+------+----------------------------------------------+
只要数据集不是太大,查询似乎就可以正常工作;关于如何简化它以支持更大的数据集的任何想法?
答案 0 :(得分:3)
查询没问题,只需创建以下索引:
pref (obj, usr, ord)
pref (usr, ord)
<强>更新强>
试试这种语法。
评级系统更简单但非常相似:它对我创建的测试随机结果给出几乎相同的评级。
SELECT oa.obj, SUM(weight) AS rate
FROM (
SELECT usr, ord,
(
SELECT COUNT(*)
FROM pref a
JOIN pref ob
ON ob.obj = a.obj
WHERE ob.usr = o.usr
AND a.usr = 50
AND a.ord <
(
SELECT ord
FROM pref ai
WHERE ai.usr = 50
AND ai.obj = 75
)
AND ob.ord < o.ord
) AS weight
FROM pref o
WHERE o.obj = 75
HAVING weight >= 0
) ow
JOIN pref oa
ON oa.usr = ow.usr
AND oa.ord > ow.ord
GROUP BY
oa.obj
ORDER BY
rate DESC
此查询为所有评分为A
的用户评分高于A
的每个项目的权重。
权重等于两位用户评定为A
以下的项目数。