我正在尝试使用sql来计算如何计算皮尔森相关系数。这是我正在使用的公式: 这是我正在使用的表格:
这是我到目前为止查询的内容,但它给了我这样的信息:无效使用群组功能
select first_id, second_id, movie_id, first_score, second_score, count(*) as n,
sum((first_score-avg(first_score))*(second_score-avg(second_score)))/
(
sqrt(sum(first_score-avg(first_score)))*
sqrt(sum(second_score-avg(second_score))))
as pearson
from connections
group by second_id
感谢您的帮助
答案 0 :(得分:2)
这是一个在公式中进行计算的查询:
select sum((first_score - avg_first_score)*(second_score - avg_second_score)) /
(sqrt(sum(pow((first_score - avg_first_score), 2)))*
sqrt(sum(pow((second_score - avg_second_score), 2)))
) as r
from connections c cross join
(select avg(first_score) as avg_first_score, avg(second_score) as avg_second_score
from connections
) const;
您的尝试存在许多问题。这会预先计算两个分数的平均值。然后,它几乎按照书面形式应用公式。
答案 1 :(得分:0)
从纯粹的语法角度来看,你的group by
条款存在问题。它应列出每个非聚合列以使其正常工作。它应该是:
group by first_id, second_id, movie_id, first_score, second_score