我是不熟悉Pig的,正在尝试与之接触,目前我正在尝试合并由movieId链接的四个csv文件。
我想要这样,以便他们都能在不复制movieId的情况下进行浏览。
我尝试过:
moviesNew = LOAD 'moviesNew.csv' USING PigStorage(',') as (movieId:int, title:chararray, genres:chararray);
ratingsNew = LOAD 'ratingsNew.csv' USING PigStorage(',') as (userId:int, movieId:int, rating:int, timestamp:int);
tagsNew = LOAD 'tagsNew.csv' USING PigStorage(',') as (userId:int, movieId:int, tag:chararray, timestamp:int);
linksNew = LOAD 'linksNew.csv' USING PigStorage(',') as (movieId:int, imdbId:int, tmdbId:int);
joined = JOIN moviesNew by movieId, ratingsNew by movieId, tagsNew by movieId, linksNew by movieId;
dump joined;
但是当我转储它时,我不确定它是否有效。