所以我有以下SQL代码来计算两个用户评级之间的皮尔森相关性:
select @u1avg:=avg(user1_rating),
@u2avg:=avg(user2_rating),
@u1sd:=stddev(user1_rating),
@u2sd:=stddev(user2_rating)
from
(select r1.userId as User1_id,r1.rating as User1_rating,
r2.userId as User2_id,r2.rating as User2_rating
from mydb.ratings r1 join mydb.ratings r2 on r1.itemId = r2.itemid
where r1.userId=1 and r2.userId=2) sample;
select (1/(count(r1.rating-1)))*sum(((r1.rating-@u1avg)/@u1sd)*((r2.rating-@u2avg)/@u2sd))*(count(r1.rating)/(1+count(r1.rating)))
from mydb.ratings r1 join mydb.ratings r2 on r1.itemId = r2.itemid
where r1.userId=1 and r2.userId=2;
我想把它变成一个函数,例如corr(A,B) 任何帮助都会有用。
我得到的问题是,它说示例说不允许或类似的东西,但是如果我删除示例我得到一个错误说每个表必须有别名。
答案 0 :(得分:1)
我认为您可以在第一个查询中取消派生表,这将查看该特定错误 -
SELECT
@u1avg:=avg(r1.rating),
@u2avg:=avg(r2.rating),
@u1sd:=stddev(r1.rating),
@u2sd:=stddev(r2.rating)
FROM mydb.ratings r1
INNER JOIN mydb.ratings r2
ON r1.itemId = r2.itemId
WHERE r1.userId=1
AND r2.userId=2;
SELECT (1/(COUNT(r1.rating-1)))*SUM(((r1.rating-@u1avg)/@u1sd)*((r2.rating-@u2avg)/@u2sd))*(COUNT(r1.rating)/(1+COUNT(r1.rating)))
FROM mydb.ratings r1
INNER JOIN mydb.ratings r2
ON r1.itemId = r2.itemid
WHERE r1.userId=1
AND r2.userId=2;