我有3张桌子:
(1)电影
id title
1 AAA
2 BBB
3 CCC
4 DDD
5 EEE
(2)流派
id film_id genre
1 1 Action
2 1 Comedy
3 1 Horror
4 2 Action
5 2 Comedy
6 3 Action
7 3 Drama
8 4 Sci-Fi
9 4 Drama
10 4 Western
10 5 Romance
10 5 Musical
10 5 Avant-Garde
(3)导演
id film_id director
1 1 John Smith
2 2 John Smith
3 2 Ann Coates
4 3 Tom Jones
5 4 Ann Coates
6 5 John Smith
我正在编写一种算法,根据电影#1最接近的匹配给我一个分数 - 任何匹配的类型得分为5分,任何匹配的导演得分为100分。
当我只比较两个表 - 电影和流派 - 使用此查询的结果是预期的:
SELECT f1.id as original_film_id, f2.id as matching_film_id, SUM(if(g1.genre = g2.genre,5,0)) as score
FROM films f1
JOIN films f2
LEFT JOIN genres g1 ON f1.id = g1.film_id
LEFT JOIN genres g2 ON f2.id = g2.film_id
WHERE f1.id = 1
GROUP BY f2.id
HAVING score > 0
ORDER BY score DESC;
结果:
original_film+id matching_film_id score
1 1 15
1 2 10
1 3 5
也就是说,电影#1中的3种类型与电影#1中的3种类型(显然)匹配,电影#2中的2种类型和电影#3中的1种类型。
但是,当我使用此查询添加director表时,我不明白结果:
SELECT f1.id as original_film_id, f2.id as matching_film_id,
SUM(if(g1.genre = g2.genre,5,0))
+ SUM(IF(d1.director = d2.director,100,0)) as score
FROM films f1
JOIN films f2
LEFT JOIN genres g1 ON f1.id = g1.film_id
LEFT JOIN genres g2 ON f2.id = g2.film_id
LEFT JOIN directors d1 ON f1.id = d1.film_id
LEFT JOIN directors d2 ON f2.id = d2.film_id
WHERE f1.id = 1
GROUP BY f2.id
HAVING score > 0
ORDER BY score DESC;
我期待看到这些结果:
original_film_id matching_film_id score
1 1 115
1 2 110
1 5 100
1 3 5
...因为电影#1具有相同的类型和导演,电影#2有2个相同的类型和相同的导演,电影#5具有相同的导演但没有匹配的类型等。
但我看到了这些结果:
original_film_id matching_film_id score
1 1 915
1 2 620
1 5 300
1 3 5
我简直无法弄明白为什么!感谢所有的帮助。
答案 0 :(得分:0)
由于您匹配了许多行(流派行与导演行),因此您将对分数进行过多计算。您将能够看到这一点,如果您删除组和总和,将枚举所有对sum的输入。
您可以独立计算类型,导演的分数,然后将它们组合起来。
select id, film_id, sum(s) as score from (
select a.id, c.film_id, sum(5) s
from films a
left join genres b on(a.id = b.film_id)
left join genres c on(b.genre = c.genre)
where a.id = 1
group by a.id, c.film_id
union all
select a.id, c.film_id, sum(100) s
from films a
left join directors b on(a.id = b.film_id)
left join directors c on(b.director = c.director)
where a.id = 1
group by a.id, c.film_id
) q
group by id, film_id
order by id, score desc
;