过滤查询

时间:2016-03-20 00:29:31

标签: sql join hql

我正在尝试编写一个HIVE查询,该查询返回他们都喜欢的电影的a.user_id和b.user_id数量。当我运行查询时,我得到a.user_id,b.user_id,count和一组电影。我还得到了b.user_id,a.user_id,count和一组电影。

我的问题是如何将查询限制为仅限a.user_id,b.user_id计数和一组电影。

通过将第4行转到ON(a.movie_id = b.movie_id AND a.user_id < b.user_id)

,我已经在Remove reverse duplicates from an SQL query尝试了一个建议的解决方案
SELECT a.user_id, b.user_id, count(*) AS num, collect_set(m.movie_title)
FROM ratings a
JOIN ratings b
ON (a.movie_id = b.movie_id)
JOIN movies m
ON (a.movie_id = m.movie_id AND b.movie_id = m.movie_id)
WHERE (a.user_id <> b.user_id)
GROUP BY a.user_id, b.user_id;
ORDER BY num DESC;

当前输出:

A,B,25,电影列表

B,A,25,电影列表

期望的输出:

A,B,25电影列表

1 个答案:

答案 0 :(得分:1)

我希望你想要的查询:

SELECT a.user_id, b.user_id, count(*) AS num, collect_set(m.movie_title)
FROM ratings a JOIN
     ratings b
     ON a.movie_id = b.movie_id JOIN
     movies m
     ON a.movie_id = m.movie_id
WHERE a.user_id < b.user_id
GROUP BY a.user_id, b.user_id
ORDER BY num DESC;