我正在阅读Alan Gates的Pig编程。
考虑代码:
ratings = LOAD '/user/maria_dev/ml-100k/u.data' AS
(userID:int, movieID:int, rating:int, ratingTime:int);
metadata = LOAD '/user/maria_dev/ml-100k/u.item' USING PigStorage ('|') AS
(movieID:int, movieTitle:chararray, releaseDate:chararray, imdbLink: chararray);
nameLookup = FOREACH metadata GENERATE
movieID, movieTitle, ToDate(releaseDate, 'dd-MMM-yyyy') AS releaseYear;
nameLookupYear = FOREACH nameLookup GENERATE
movieID, movieTitle, GetYear(releaseYear) AS finalYear;
filterMovies = FILTER nameLookupYear BY finalYear < 1982;
groupedMovies = GROUP filterMovies BY finalYear;
orderedMovies = FOREACH groupedMovies {
sortOrder = ORDER metadata by finalYear DESC;
GENERATE GROUP, finalYear;
};
DUMP orderedMovies;
它表明
“按地图,元组或袋子排序会产生错误”。
我想知道如何对分组结果进行排序。
转换是否需要遵循一定的顺序才能生效?
答案 0 :(得分:1)
如果要对分组的值进行排序,则必须使用嵌套的foreach。这将在组中以降序对年份进行排序。
orderedMovies = FOREACH groupedMovies {
sortOrder = ORDER metadata by GetYear(ToDate(releaseDate, 'dd-MMM-yyyy')) DESC;
GENERATE GROUP, movieID, movieTitle;
};
答案 1 :(得分:1)
由于您要对分组结果进行排序,因此不需要嵌套的foreach。例如,如果要对年度内的每部电影按标题或发行日期进行排序,则可以使用嵌套的foreach。尝试照常订购(由于您在上一行中按finalYear
进行了分组,因此将group
称为finalYear
)
orderedMovies = ORDER groupedMovies BY group ASC;
DUMP orderedMovies;