尝试使用带有公共列的Pig将四个csv文件合并为一个

时间:2019-11-03 22:05:53

标签: apache-pig

我是不熟悉Pig的,正在尝试与之接触,目前我正在尝试合并由movieId链接的四个csv文件。

moviesNew.csv

ratingNew.csv

tagsNew.csv

linksNew.csv

我想要这样,以便他们都能在不复制movieId的情况下进行浏览。

我尝试过:

moviesNew = LOAD 'moviesNew.csv' USING PigStorage(',') as (movieId:int, title:chararray, genres:chararray);

ratingsNew = LOAD 'ratingsNew.csv' USING PigStorage(',') as (userId:int, movieId:int, rating:int, timestamp:int);

tagsNew = LOAD 'tagsNew.csv' USING PigStorage(',') as (userId:int, movieId:int, tag:chararray, timestamp:int);

linksNew = LOAD 'linksNew.csv' USING PigStorage(',') as (movieId:int, imdbId:int, tmdbId:int);

joined = JOIN moviesNew by movieId, ratingsNew by movieId, tagsNew by movieId, linksNew by movieId;

dump joined;

但是当我转储它时,我不确定它是否有效。

0 个答案:

没有答案