在Hadoop Pig中加入和分组

时间:2016-03-13 22:45:32

标签: hadoop apache-pig

经常看到人们正在使用group by并加入同样的问题,假设我有一个学生表和分数表,想找到具有相关课程分数的学生姓名。看来我们可以通过使用join或使用group by来解决这个问题?想知道这两种解决方案的优缺点。发布下面的数据结构和代码。感谢。

table students:

student ID, student name, student email address

score table:

student ID, course ID, score

student_scores = group students by (studentId) inner, scores by (studentId);

student_scores = join students by student Id, scores by studentId;

1 个答案:

答案 0 :(得分:1)

在Pig Latin Manuall关于Join,它说:

Note the following about the GROUP/COGROUP and JOIN operators:

The GROUP and JOIN operators perform similar functions. GROUP creates a nested set of output tuples while JOIN creates a flat set of output tuples.
The GROUP/COGROUP and JOIN operators handle null values differently (see Nulls and JOIN Operator).

不确定它是否优点&缺点,但他们是不同的