我有以下PIG脚本:
A = LOAD 'text_a.txt' USING PigStorage();
B = LOAD 'text_b.txt' USING PigStorage();
SOMETHING = FILTER A $0 matches 'SOMETHING';
FOOBAR = FILTER A $0 matches 'FOOBAR';
SOMETHING_B = JOIN SOMETHING BY key, B BY $1;
FOOBAR_B = JOIN FOOBAR BY key, B BY $1;
TEMP = JOIN SOMETHING_B BY key, FOOBAR_B by key;
OUT = FOREACH TEMP GENERATE SOMETHING_B::$1 - FOOBAR_B::$1;
dump OUT;
当此脚本运行时,看起来A和B中的数据是从源中读取两次?有没有办法阻止它第二次被阅读?
答案 0 :(得分:0)
首先才有 在脚本结尾处“解析”以确定数据是否被读取两次。
看着你的脚本剂量看起来像A,B被叫两次