我有一张表格,其中列出了每部电影和评论家的评分,评论家给这部电影的分数:(film_id,critic_id,score)。我有以下PostgreSQL查询,以查找在一系列评论家中平均得分最高的10部电影:
SELECT
f_id, avg(f_score)
FROM
(
SELECT
s.film_id as f_id, s.critic_id as c_id, s.score as f_score
FROM
score s
WHERE
s.critic_id = ANY(ARRAY['CRITIC_BOB_0213', 'CRITIC_AMY_9671'])
GROUP BY
s.film_id, s.critic_id, s.score
)
sub
GROUP BY
f_id
ORDER BY
avg desc
LIMIT
10;
在这种情况下,用户会说他想知道评论家Bob和Amy的分数,然后它会返回:
f_id | avg
"742545" 13.0330650266333
"220176" 6.7783259974
"662682" 6.52305498088333
...
现在,我希望用户能够给某个评论家一定的分量。
所以基本上,用户给出了输入[('CRITIC_BOB_0213', 0.923), ('CRITIC_AMY_9671', 0.212)]
(例如,如果他更重视鲍勃的判断而不是艾米的判断),我需要查询来反映这一点。所以你会得到一个加权平均值:avg(score_bob*0.923 + score_amy*0.212)
。我需要它在查询本身,电影的数量是数百万,我不想在计算我的后端代码中的加权平均值之前必须返回它们。
这在PostgreSQL中是否可行?
答案 0 :(得分:0)
通过以下方式自行解决:
SELECT
f_id, avg(weighted_score)
FROM
(
SELECT
s.film_id as f_id
,
CASE
WHEN s.critic_id='CRITIC_BOB_0213' THEN s.score*CRITIC_BOB_WEIGHT
WHEN s.critic_id='CRITIC_AMY_9671' THEN s.score*CRITIC_AMY_WEIGHT
ELSE -1
END as weighted_score
FROM
score s
WHERE
s.critic_id = ANY(ARRAY['CRITIC_BOB_0213', 'CRITIC_AMY_9671'])
GROUP BY
s.film_id, s.critic_id
)
sub
GROUP BY
f_id
ORDER BY
avg desc
LIMIT
10;
希望将来能帮助别人。