我有列出发行日期的电影。我想获取比给定年份更新的电影列表,例如1982年,所以电影在1983年,1984年等等,使用Apache Pig。
日期格式为1995年1月1日。我可以正确加载数据,但是我的FILTER操作指出类型不匹配。
我尝试将chararray转换为datetime格式,但结果是日期格式为1995-01-01T00:00:00.000-08:00。
1)如何仅检索年份
2)仅过滤比所选年份新的值?
ratings = LOAD '/user/maria_dev/ml-100k/u.data' AS (userID:int, movieID:int, rating:int, ratingTime:int);
metadata = LOAD '/user/maria_dev/ml-100k/u.item' USING PigStorage ('|') AS (movieID:int, movieTitle:chararray, releaseDate:chararray, imdbLink: chararray);
nameLookup = FOREACH metadata GENERATE movieID, movieTitle, ToDate(releaseDate, 'dd-MMM-yyyy') AS releaseYear;
nameLookupYear = FOREACH nameLookup GENERATE movieID, movieTitle, ToString(releaseYear, 'yyyy') AS movieYear;
oldMovies = FILTER nameLookupYear by movieYear < ('1982');
DUMP oldMovies;
答案 0 :(得分:1)
将GetYear()用作日期时间对象的年份部分,如果您想要的电影比1982年新,则过滤器应为movieYear > 1982
nameLookupYear = FOREACH nameLookup GENERATE movieID, movieTitle, GetYear(releaseYear) AS movieYear;
oldMovies = FILTER nameLookupYear by movieYear > 1982;