以下是我拥有的数据和架构,以下是 - student_name,question_number,actual_result(任意 - 假/正确)
(b,q1,Correct)
(a,q1,false)
(b,q2,Correct)
(a,q2,false)
(b,q3,false)
(a,q3,Correct)
(b,q4,false)
(a,q4,false)
(b,q5,flase)
(a,q5,false)
我想要的是获得每个学生的计数,即a / b总数 他/她做出的正确和错误的答案。
答案 0 :(得分:1)
对于共享的用例,下面的猪脚本就足够了。
猪脚本:
student_data = LOAD 'student_data.csv' USING PigStorage(',') AS (student_name:chararray, question_number:chararray, actual_result:chararray);
student_data_grp = GROUP student_data BY student_name;
student_correct_answer_data = FOREACH student_data_grp {
answers = student_data.actual_result;
correct_answers = FILTER answers BY actual_result=='Correct';
incorrect_answers = FILTER answers BY actual_result=='false';
GENERATE group AS student_name, COUNT(correct_answers) AS correct_ans_count, COUNT(incorrect_answers) AS incorrect_ans_count ;
};
输入:student_data.csv:
b,q1,Correct
a,q1,false
b,q2,Correct
a,q2,false
b,q3,false
a,q3,Correct
b,q4,false
a,q4,false
b,q5,false
a,q5,false
输出:DUMP kpi:
-- schema : (student_name, correct_ans_count, incorrect_ans_count)
(a,1,4)
(b,2,3)
参考:有关嵌套FOR EACH
的详细信息答案 1 :(得分:1)
使用此:
data = LOAD '/abc.txt' USING PigStorage(',') AS (name:chararray, number:chararray,result:chararray);
B = GROUP data by (name,result);
C = foreach B generate FLATTEN(group) as (name,result), COUNT(data) as count;
并且答案如下:
(a,false,4)
(a,Correct,1)
(b,false,3)
(b,Correct,2)
希望这是您正在寻找的输出