编辑寻找SQL改进

时间:2013-03-26 02:30:16

标签: mysql

通过引用Collaborative filtering in MySQL?,我创建了以下内容:

CREATE TABLE `ub` (
  `user_id` int(11) NOT NULL,
  `book_id` varchar(10) NOT NULL,
  `rate` int(11) NOT NULL,
  PRIMARY KEY (`user_id`,`book_id`),
  UNIQUE KEY `book_id` (`book_id`,`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

insert into ub values (1, 'A', '8'), (1, 'B', '7'), (1, 'C', '10');
insert into ub values (2, 'A', '8'), (2, 'B', '7'), (2, 'C', '10'), (2,'D', '8'), (2,'X', '7');
insert into ub values (3, 'X', '10'), (3, 'Y', '8'), (3, 'C', '10'), (3,'Z', '10');
insert into ub values (4, 'W', '8'), (4, 'Q', '8'), (4, 'C', '10'), (4,'Z', '8');

然后,我可以得到下表并了解它是如何工作的。

create temporary table ub_rank as 
select similar.user_id,count(*) rank
from ub target 
join ub similar on target.book_id= similar.book_id and target.user_id != similar.user_id and target.rate= similar.rate
where target.user_id = 1
group by similar.user_id;

select * from ub_rank;

+---------+------+
| user_id | rank |
+---------+------+
|       2 |    3 |
|       3 |    1 |
|       4 |    1 |
+---------+------+

但是,在下面的代码之后我开始感到困惑。

select similar.rate, similar.book_id, sum(ub_rank.rank) total_rank
from ub_rank
join ub similar on ub_rank.user_id = similar.user_id 
left join ub target on target.user_id = 1 and target.book_id = similar.book_id and target.Rate= similar.Rate 
where target.book_id is null
group by similar.book_id
order by total_rank desc, rate desc;

+---------+------------+
| book_id | total_rank |
+---------+------------+
| X       |          4 |
| D       |          3 |
| Z       |          2 |
| Y       |          1 |
| Q       |          1 |
| W       |          1 |
+---------+------------+
<已解决](已解决)首先,我想知道X和D的总排名为何不相同(即3)。不计算用户B的用户A相同的图书数量吗?那么,D和X应该是3?!

(已解决)其次,我应该如何修改代码,例如费率可以作为排名的元素。也就是说,如果2本书的等级相同,则具有较高分数的书将排名较高。

由于

EDITED

(1,'A','8'),(1,'B','7'),(1,'C','10');

(2,'A','8'),(2,'B','7'),(2,'C','10'),(2,'D','8') ,(2,'X','7');

我想做的是,假设用户1和2具有相似的行为(在匹配评级之前选择A,B,C),因此我会向用户A推荐D,因为它具有更高的速率。

上面的代码似乎不这样做?因为,排名第一的是X.

1 个答案:

答案 0 :(得分:1)

  

首先,我想知道X和D的总排名为何不相同(即   3)。是不是计算与用户A相同的书籍数量   用户B?那么,D和X应该是3?!

X在第二个user_id和第三个user_id中的排名更高,查询得到排名的总和,在这种情况下为3 (user_id = 2) + 1 (user_id = 3)

  

其次,我应该如何修改代码,如速率可以充当   排名元素。也就是说,如果2本书的排名相同,   那么得分较高的人将获得更高的排名。

使用相同的查询并按排名后的desc率排序,例如

select similar.book_id, sum(ub_rank.rank) total_rank
from ub_rank
join ub similar on ub_rank.user_id = similar.user_id 
left join ub target on target.user_id = 1 and target.book_id = similar.book_id and target.Rate= similar.Rate 
where target.book_id is null
group by similar.book_id
order by total_rank desc, rate desc;

更新:根据您的要求,您需要获取与其他用户最匹配并且具有最高价格的图书列表,请尝试以下查询相同

SELECT
    temp.book_id,
    temp.rate as book_rate
FROM (
        SELECT 
            similar.user_id, 
            COUNT( similar.book_id ) as book_match_count
        FROM 
            ub target
            JOIN ub similar ON  target.book_id= similar.book_id AND target.user_id != similar.user_id
        WHERE 
            target.user_id = 1
        GROUP BY 
            similar.user_id
    ) AS users_with_book_matches
JOIN ub temp ON ( temp.user_id =users_with_book_matches.user_id AND temp.book_id NOT IN ( SELECT book_id FROM ub WHERE ub.user_id = 1 ) ) 
GROUP BY
    temp.book_id
ORDER BY 
    users_with_book_matches.book_match_count DESC,
    temp.rate DESC
limit 5

以上查询获得前5个最接近的图书匹配

这是SqlFiddle,请确保在2个地方更改user_id,希望这符合您的目的