需要帮助在pig Latin中丢弃全外连接结果中的空值。以下是两个数据集:
A:
(BOS,2)
(BUR,81)
(LAS,8)
B:
(BUR,56)
(EWR,2)
(LAS,88)
完全外连接后: C:
(BOS,2,,)
(BUR,81,BUR,56)
(,,EWR,2)
(LAS,8,LAS,88)
我需要以下面的格式获得输出:
(BOS,2)
(BUR,137)
(EWR,2)
(LAS,96)
尝试了group by,flatten,bagtotuple的不同组合......但是无法弄清楚解决方案。非常感谢您的帮助。
airline = load '/demo/data/airline/airline.csv' using PigStorage(',') as (Origin: chararray, Dest: chararray);
traffic_in = GROUP airline by Origin;
traffic_in_count= FOREACH traffic_in generate group as Origin , COUNT(airline) as count ;
traffic_out = GROUP airline by Dest;
traffic_out_count = FOREACH traffic_out generate group as Dest ,COUNT (airline) as count;
traffic_top = JOIN traffic_in_count by Origin FULL OUTER , traffic_out_count by Dest ;