Question

如何比较猪的两个文字

实施例文件1：

 1. 123456 raj kall dno 23  23-02-1984  xyz
 2. 123457 Tal dall dno 23  23-02-1985  xyz
 3. 123458 aaa fff  dno 23  23-02-1986  xyz
 4. 123459 gg  hhhh dno 23  23-02-1987  xyz
 5. 123460 aa  hhhh dno 23  23-02-1987  xyz
 6. 123461 bbb hhhh dno 23  23-02-1987  xyz

文件2：

 1. 123456 raj kall dno 23  23-02-1984  xyz
 2. 123457 Tal dall dno 23  23-02-1985  xyz
 3. 123458 aaa uuu  dno 23  23-02-1986  xyz
 4. 123459 gg  hhhh dno 23  23-02-1987  xyz
 5. 123461 bb  hhhh dno 23  23-02-1987  xyz

预期产出：

123458 aaa fff  dno 23  23-02-1986  xyz
123460 aa  hhhh dno 23  23-02-1987  xyz
123461 bbb hhhh dno 23  23-02-1987  xyz

Answer 1

如果你想要A - B（存在于A但不存在于B中的行），则使用左外连接并检查具有右侧关系的行是否为空。

A = LOAD 'file1' USING PigStorage() AS input:chararray;
B = LOAD 'file2' USING PigStorage() AS input:chararray;

C = JOIN A BY input LEFT OUTER, B BY input;
D = FILTER C BY B::input IS NULL;
E = FOREACH D GENERATE A::input;

我没有运行代码。它可能有语法问题。希望这会有所帮助。

如何在猪hadoop的两个文本中逐行比较

1 个答案: