如何在HIVE中编写以下查询

时间:2017-02-05 18:14:00

标签: sql hadoop hive


我在HIVE中实现了以下查询:

SELECT title, rating FROM 
( 
    SELECT m.title as title, variance(r.rating) as var, r.rating as     rating, r.time_stamp as time_stamp
    FROM movies m JOIN ratings r ON m.movieid = r.movieid
    DISTRIBUTE BY m.title, r.rating
    GROUP BY m.title
    SORT BY m.title, r.rating
) A
WHERE year(from_unixtime(time_stamp)) = '2015'
GROUP BY title
LIMIT 10;


但是我收到以下错误:

Error while compiling statement: FAILED: ParseException line 6:4 missing ) at 'GROUP' near 'GROUP' line 6:10 missing EOF at 'BY' near 'GROUP'

2 个答案:

答案 0 :(得分:0)

我认为这就是你想要的:

SELECT m.movieid, m.title, variance(r.rating) as var
FROM movies m JOIN
     ratings r
     ON m.movieid = r.movieid
WHERE year(from_unixtime(time_stamp)) = 2015
GROUP BY m.movieid, m.title
ORDER BY var DESC
LIMIT 10;

答案 1 :(得分:0)

帕特里克,它仍然是SQL   - 您无法选择不属于GROUP BY的列   - YEAR返回一个整数(P.s.评级未分区?)   - 您应该有充分的理由使用来自Hive开始时间的技术条款DISTRIBUTE BYSORT BY

select      m.title
           ,r.var

from                   (select      r.movieid
                                   ,variance(r.rating)  as var

                        from        ratings as r

                        where       year(from_unixtime(time_stamp)) = 2015

                        group by    r.movieid

                        order by    var desc

                        limit       10
                        ) as r

            join        movies as m

            on          m.movieid   =
                        r.movieid
;