经常看到人们正在使用group by并加入同样的问题,假设我有一个学生表和分数表,想找到具有相关课程分数的学生姓名。看来我们可以通过使用join或使用group by来解决这个问题?想知道这两种解决方案的优缺点。发布下面的数据结构和代码。感谢。
table students:
student ID, student name, student email address
score table:
student ID, course ID, score
student_scores = group students by (studentId) inner, scores by (studentId);
student_scores = join students by student Id, scores by studentId;
答案 0 :(得分:1)
在Pig Latin Manuall关于Join,它说:
Note the following about the GROUP/COGROUP and JOIN operators:
The GROUP and JOIN operators perform similar functions. GROUP creates a nested set of output tuples while JOIN creates a flat set of output tuples.
The GROUP/COGROUP and JOIN operators handle null values differently (see Nulls and JOIN Operator).
不确定它是否优点&缺点,但他们是不同的