在MySQL数据库中,我试图在多个数字属性中找到最相似的行。此问题类似于this question,但包括灵活数量的比较和连接表。
数据库由两个表组成。第一个表是用户,我正在尝试比较。
id | self_ranking
----------------------------------
1 | 9
2 | 3
3 | 2
第二个表格是用户对特定项目的一系列分数。
id | user_id | item_id | score
----------------------------------
1 | 1 | 1 | 4
2 | 1 | 2 | 5
3 | 1 | 3 | 8
4 | 1 | 4 | 3
我想找到给定的“最相似”的用户,同等地评估所有排名的项目(以及自我评分)。因此,完全匹配将是以完全相同的方式对所有相同项目进行排序的用户。对自己的评价是相同的,而下一个最佳选择是一个项目的排名略有差异。
我遇到了困难:
有人可以帮我构建合理的查询吗?我对MySQL并不十分强大,如果这个答案显而易见,那就很抱歉。
如果用户4已将自己排名为8且项目1 => 4且2 => 5,则我希望查询用户4的最近用户返回1,即最近用户的user_id。 / p>
答案 0 :(得分:1)
在@ eggyal方法的细微改进中,我合并了我们能够匹配的项目数量。
SELECT u2.user_id
-- join our user to their scores
FROM (users u1 JOIN scores s1 USING (user_id))
-- and then join other users and their scores
JOIN (users u2 JOIN scores s2 USING (user_id))
ON s1.item_id = s2.item_id
AND u1.user_id != u2.user_id
-- filter for our user of interest
WHERE u1.user_id = ?
-- group other users' scores together
GROUP BY u2.user_id
-- subtract the degree of difference in correlating scores from the number of correlating scores
ORDER BY (SUM(s1.item_id = s2.item_id) -
( SUM(ABS(s2.score - s1.score) + ABS(u2.self - u1.self) ) ) ) DESC
答案 1 :(得分:0)
SELECT u2.user_id
-- join our user to their scores
FROM (users u1 JOIN scores s1 USING (user_id))
-- and then join other users and their scores
JOIN (users u2 JOIN scores s2 USING (user_id))
ON s1.item_id = s2.item_id
AND u1.user_id != u2.user_id
-- filter for our user of interest
WHERE u1.user_id = ?
-- group other users' scores together
GROUP BY u2.user_id
-- and here's the magic: order in descending order of "distance" between
-- our selected user and all of the others: you may wish to weight
-- self_ranking differently to item scores, in which case just multiply
-- appropriately
ORDER BY SUM(ABS(s2.score - s1.score))
+ ABS(u2.self_ranking - u1.self_ranking) DESC