cust_joined = JOIN cust_filtered BY (LOW, HIGH, NORMAL), cust_conversion BY (Low, High, Normal);
当所有这些字段的数据类型都是chararray时。
cust_filtered包括以下条目 - 我存储了这个关系,这些关系存在于该文件中:
cust_id Val1 Val2 Year Low High Normal Prod-code
1925635222 16.2 61.2 2013 null null <=6.9 1234548-5
9253821456 16.8 65.8 2014 null null <7.0 4548567-9
cust_conversion中的示例条目 - 我存储了这个关系,这些关系存在于该文件中:
Low High Normal Cust-Session Price
null null <=6.9 ABC-1234 16.9
null null <=7.0 PQR-4567 87.0
不幸的是,cust_joined关系为空。任何有关这方面的帮助都会很棒
答案 0 :(得分:0)
我不知道为什么它不适合你,它对我有用。你能粘贴猪脚吗。
cust_filtered = LOAD 'cust_filtered.txt' USING PigStorage(' ') AS(cust_id:long,Val1:float,Val2:float,Year:chararray, LOW:chararray, HIGH:chararray, NORMAL:chararray, Prod_code:chararray);
cust_conversion = LOAD 'cust_conversion.txt' USING PigStorage(' ') AS (Low:chararray, High:chararray, Normal:chararray, Cust_Session:chararray, Price:float);
cust_joined = JOIN cust_filtered BY (LOW, HIGH, NORMAL), cust_conversion BY (Low, High, Normal);
DUMP cust_joined;
OUTPUT:
(1925635222,16.2,61.2,2013,null,null,<=6.9,1234548-5,null,null,<=6.9,ABC-1234,16.9)
答案 1 :(得分:0)
我假设你的数据中有null,你有实际的猪值null
,这是正确的吗?
如果是这样,那么这就是原因,因为猪内部联接忽略了空键:
The JOIN operator always performs an inner join. Inner joins ignore null keys, so it makes sense to filter them out before the join.
请参阅http://pig.apache.org/docs/r0.12.0/basic.html#join-inner
如果这些是以"null"
作为值的字符集,那么这应该基本上起作用,如另一个答案所述。