我有两个文件:帖子和用户。我需要通过帖子获得前10位用户,这在SQL中应该是:
SELECT us.name, COUNT(po.id) AS NumberOfPost FROM User us INNER JOIN Post po on
po.userId = us.id GROUP BY us.name ORDER BY NumberOfPost DESC;
只有一份工作可以做到这一点吗?不需要工作就能使JOIN和工作进入前十名?我必须遵循“十大”的mapreduce模式,但在这种情况下我不必遵循任何连接模式。有一种方法可以只使用一个作业吗?
答案 0 :(得分:1)
最好在Hive中实现它。执行下面提到的查询来做前10名
SELECT us.name, COUNT(po.id) AS NumberOfPost FROM User us INNER JOIN Post po on po.userId = us.id GROUP BY us.name ORDER BY NumberOfPost DESC Limit 10;