PIG - Scalar在输出中有多行

时间:2014-03-20 02:16:55

标签: hadoop mapreduce apache-pig

我有一个电影数据库的以下数据集:

评级:UserID,MovieID,评级 电影:MovieID,类型 用户:UserID,性别,年龄

我写了一个PIG脚本,以获得评分最高的电影的年龄组(20-30岁)中的女性用户。以下是我到目前为止的代码:

users_input = load '/users.dat' USING PigStorage('\u003B') as (UserID: long, gender: chararray, age: int, occupation: int, zip: long);
movies_input = load '/movies.dat' USING PigStorage('\u003B') as (MovieID: long, title: chararray, genre: chararray);
ratings_input = load '/ratings.dat' USING PigStorage('\u003B') as (UserID: long, MovieID: long, rating: int, timestamp: chararray);

movie_filter = filter movies_input by (genre matches '.*Action.*') OR (genre matches '.*War.*');

temp = COGROUP movie_filter by MovieID, ratings_input by MovieID;

temp1 = FILTER temp BY COUNT(movie_filter) > 0;

temp2 = FOREACH temp1 GENERATE group, AVG(ratings_input.rating) AS ratings;

temp3 = ORDER temp2 BY ratings DESC;

temp4 = LIMIT temp3 1;

temp5 = FOREACH temp4 GENERATE ratings;

temp6 = FILTER temp3 BY (temp5.ratings == ratings);

female_users = filter users_input by gender == 'F';
age_users = filter female_users by age >=20 AND age <=30;
age_use = FOREACH age_users GENERATE UserID;

MovID = FOREACH temp6 GENERATE group;

all_users_records = FILTER ratings_input BY (MovID.group == MovieID);

all_users = FOREACH all_users_records GENERATE UserID;

female_aged_records = FILTER all_users BY (UserID == age_use.UserID);

female_aged_users = FOREACH female_aged_records GENERATE UserID;

store all_users into '/output_pig' using PigStorage();

我执行此操作但最终得到错误:“标量在输出中有多行。第一:(11),第二:(24)

有人可以帮帮我吗?提前谢谢。

2 个答案:

答案 0 :(得分:16)

正如其他人所说,这不是一个非常有用的错误信息。你可能got a dot where you need a double semi-colon

答案 1 :(得分:7)

@jhofman,我认为你的意思是双冒号(关系运算符)&#39; ::&#39;而不是一个点。

最后,猪脚本应如下所示:

...
temp2 = FOREACH temp1 GENERATE组,AVG(ratings_input :: 评级)AS评分;
...... temp6 = FILTER temp3 BY(temp5 :: 评分==评分);
...... all_users_records = FILTER ratings_input BY(MovID :: group == MovieID);

all_users = FOREACH all_users_records GENERATE UserID;

female_aged_records = FILTER all_users BY(UserID == age_use :: UserID); < / strong>