Apache Pig join返回空关系

时间:2014-10-23 06:25:29

标签: apache-pig

cust_joined = JOIN cust_filtered BY (LOW, HIGH, NORMAL), cust_conversion BY (Low, High, Normal);

当所有这些字段的数据类型都是chararray时。

cust_filtered包括以下条目 - 我存储了这个关系,这些关系存在于该文件中:

cust_id           Val1      Val2     Year      Low  High  Normal  Prod-code
1925635222        16.2      61.2     2013      null null  <=6.9   1234548-5
9253821456        16.8      65.8     2014      null null  <7.0    4548567-9

cust_conversion中的示例条目 - 我存储了这个关系,这些关系存在于该文件中:

Low     High   Normal     Cust-Session     Price
null    null    <=6.9      ABC-1234        16.9
null    null    <=7.0      PQR-4567        87.0

不幸的是,cust_joined关系为空。任何有关这方面的帮助都会很棒

2 个答案:

答案 0 :(得分:0)

我不知道为什么它不适合你,它对我有用。你能粘贴猪脚吗。

cust_filtered = LOAD 'cust_filtered.txt' USING PigStorage(' ') AS(cust_id:long,Val1:float,Val2:float,Year:chararray, LOW:chararray, HIGH:chararray, NORMAL:chararray, Prod_code:chararray);
cust_conversion = LOAD 'cust_conversion.txt' USING PigStorage(' ') AS (Low:chararray, High:chararray, Normal:chararray, Cust_Session:chararray, Price:float);
cust_joined = JOIN cust_filtered BY (LOW, HIGH, NORMAL), cust_conversion BY (Low, High, Normal);
DUMP cust_joined;

OUTPUT:
(1925635222,16.2,61.2,2013,null,null,<=6.9,1234548-5,null,null,<=6.9,ABC-1234,16.9)

答案 1 :(得分:0)

我假设你的数据中有null,你有实际的猪值null,这是正确的吗?

如果是这样,那么这就是原因,因为猪内部联接忽略了空键:

The JOIN operator always performs an inner join. Inner joins ignore null keys, so it makes sense to filter them out before the join.

请参阅http://pig.apache.org/docs/r0.12.0/basic.html#join-inner

如果这些是以"null"作为值的字符集,那么这应该基本上起作用,如另一个答案所述。