使用MySQL

时间:2015-11-20 04:56:32

标签: mysql mysql-workbench

我有3张桌子:

(1)电影

id  title
1   AAA
2   BBB
3   CCC
4   DDD
5   EEE

(2)流派

id  film_id genre
1   1       Action
2   1       Comedy
3   1       Horror
4   2       Action
5   2       Comedy
6   3       Action
7   3       Drama
8   4       Sci-Fi
9   4       Drama
10  4       Western
10  5       Romance
10  5       Musical
10  5       Avant-Garde

(3)导演

id  film_id director
1   1       John Smith
2   2       John Smith
3   2       Ann Coates
4   3       Tom Jones
5   4       Ann Coates
6   5       John Smith

我正在编写一种算法,根据电影#1最接近的匹配给我一个分数 - 任何匹配的类型得分为5分,任何匹配的导演得分为100分。

当我只比较两个表 - 电影和流派 - 使用此查询的结果是预期的:

SELECT f1.id as original_film_id, f2.id as matching_film_id, SUM(if(g1.genre = g2.genre,5,0)) as score 
FROM films f1
JOIN films f2
LEFT JOIN genres g1 ON f1.id = g1.film_id
LEFT JOIN genres g2 ON f2.id = g2.film_id
WHERE f1.id = 1 
GROUP BY f2.id
HAVING score > 0
ORDER BY score DESC;

结果:

original_film+id    matching_film_id    score
1                   1                   15
1                   2                   10
1                   3                   5

也就是说,电影#1中的3种类型与电影#1中的3种类型(显然)匹配,电影#2中的2种类型和电影#3中的1种类型。

但是,当我使用此查询添加director表时,我不明白结果:

SELECT f1.id as original_film_id, f2.id as matching_film_id, 
SUM(if(g1.genre = g2.genre,5,0)) 
+ SUM(IF(d1.director = d2.director,100,0)) as score
FROM films f1
JOIN films f2
LEFT JOIN genres g1 ON f1.id = g1.film_id
LEFT JOIN genres g2 ON f2.id = g2.film_id
LEFT JOIN directors d1 ON f1.id = d1.film_id
LEFT JOIN directors d2 ON f2.id = d2.film_id
WHERE f1.id = 1 
GROUP BY f2.id
HAVING score > 0
ORDER BY score DESC;

我期待看到这些结果:

original_film_id    matching_film_id    score
1                   1                   115
1                   2                   110
1                   5                   100
1                   3                   5

...因为电影#1具有相同的类型和导演,电影#2有2个相同的类型和相同的导演,电影#5具有相同的导演但没有匹配的类型等。

但我看到了这些结果:

original_film_id    matching_film_id    score
1                   1                   915
1                   2                   620
1                   5                   300
1                   3                   5

我简直无法弄明白为什么!感谢所有的帮助。

1 个答案:

答案 0 :(得分:0)

由于您匹配了许多行(流派行与导演行),因此您将对分数进行过多计算。您将能够看到这一点,如果您删除组和总和,将枚举所有对sum的输入。

您可以独立计算类型,导演的分数,然后将它们组合起来。

select id, film_id, sum(s) as score from (
  select      a.id, c.film_id, sum(5) s
  from        films a
  left join   genres b on(a.id = b.film_id)
  left join   genres c on(b.genre = c.genre)
  where a.id = 1
  group by a.id, c.film_id
  union all
  select      a.id, c.film_id, sum(100) s
  from        films a
  left join   directors b on(a.id = b.film_id)
  left join   directors c on(b.director = c.director)
  where a.id = 1
  group by a.id, c.film_id
) q
group by id, film_id
order by id, score desc
;