比较Pig Latin中的两个关系

时间:2014-08-04 08:30:21

标签: apache-pig

如何比较两个大关系是否包含完全相同的记录。

两个关系可以有很多记录,例如100万行,每行有500列。如何确认一个关系中的所有记录是否与其他关系中的记录完全相同。

1 个答案:

答案 0 :(得分:0)

试试这个,

1.First load the relation in one alias with only one column say 'a',
2.Found its count(no. of rows)
3.Then load the second relation in another alias with only one column say 'b',
4.Found its count(no. of rows)
5.Join(Inner) the above two relations using columns a and b
6.then count the number of rows in the joined relation
7.compare the first relation count with join relation count or compare the second relation count with join relation count, if it is equal then both relations having same data.