我有通过以下阅读的表格。
A = load 'customer' using PigStorage('|');
在客户关注中有一些行
7|Ron|ron@abc.com
8|Rina
9|Don|dmes@xyz.com
9|Don|dmes@xyz.com
10|Maya|maya@cnn.com
11|marry|mary@abc.com
当我使用以下....
B = DISTINCT A;
A_CLEAN = FILTER B by ($0 is not null) AND ($1 is not null) AND ($2 is not null);
它删除 8 | Rina以及
如何通过Pig删除空行?
我有办法尝试吗? A_CLEAN =过滤器B不是IsNULL()???
我是猪新手,所以不确定我把它放在IsNULL里面......
由于
A_CLEAN =过滤器B不是IsEmpty(B);
答案 0 :(得分:2)
尝试以下方法:
A = LOAD 'customer' USING PigStorage('|');
B = DISTINCT A;
A_CLEAN = FILTER B BY NOT(($0 IS NULL) AND ($1 IS NULL) AND ($2 IS NULL));
DUMP A_CLEAN;
这将产生输出:
(8,Rina)
(7,Ron,ron @ abc.com)
(9,Don,dmes @ xyz.com)
(10,Maya,maya @ cnn.com)
(11,结婚,玛丽@ abc.com)
在PIG中,你无法测试元组的空虚。
答案 1 :(得分:0)
Tarun, instead AND condition why can't you put OR condition.
A_CLEAN = FILTER B by ($0 is not null) OR ($1 is not null) OR ($2 is not null);
This will remove all the null rows and retain if any columns is not empty.
Can you try and let me know if this works for your all conditions?
更新:
我不知道为什么IsEmpty()不适合你,它为我工作。
IsEmpty只适用于包,所以我将所有的字段转换为包并测试空虚。见下面的工作代码。
input.txt
7|Ron|ron@abc.com
8|Rina
9|Don|dmes@xyz.com
9|Don|dmes@xyz.com
10|Maya|maya@cnn.com
11|marry|mary@abc.com
PigSCript:
A = LOAD 'input.txt' USING PigStorage('|');
B = DISTINCT A;
A_CLEAN = FILTER B BY NOT IsEmpty(TOBAG($0..));
DUMP A_CLEAN;
Output:
(8,Rina )
(7,Ron,ron@abc.com)
(9,Don,dmes@xyz.com)
(10,Maya,maya@cnn.com)
(11,marry,mary@abc.com)
对于您的另一个问题,它是一个简单的数学计算
In case of AND,
8|Rina
will be treated as
($0 is not null) AND ($1 is not null) AND ($2 is not null)
(true) AND (true) AND (false)
(false) -->so this record will be skipped by Filter command
In case of OR,
8|Rina
will be treated as
($0 is not null) OR ($1 is not null) OR ($2 is not null)
(true) OR (true) OR (false)
(true) -->so this record will be included into the relation by Filter command
In case of empty record,
<empty record>
will be treated as
($0 is not null) OR ($1 is not null) OR ($2 is not null)
(false) OR (false) OR (false)
(false) -->so this record will be skipped by Filter command