Question

经常看到人们正在使用group by并加入同样的问题，假设我有一个学生表和分数表，想找到具有相关课程分数的学生姓名。看来我们可以通过使用join或使用group by来解决这个问题？想知道这两种解决方案的优缺点。发布下面的数据结构和代码。感谢。

table students:

student ID, student name, student email address

score table:

student ID, course ID, score

student_scores = group students by (studentId) inner, scores by (studentId);

student_scores = join students by student Id, scores by studentId;

Answer 1

在Pig Latin Manuall关于Join，它说：

Note the following about the GROUP/COGROUP and JOIN operators:

The GROUP and JOIN operators perform similar functions. GROUP creates a nested set of output tuples while JOIN creates a flat set of output tuples.
The GROUP/COGROUP and JOIN operators handle null values differently (see Nulls and JOIN Operator).

不确定它是否优点＆amp;缺点，但他们是不同的

在Hadoop Pig中加入和分组

1 个答案: