在pig中连接多个键时出现语法错误

时间:2014-04-02 11:50:09

标签: apache-pig

尝试在猪中加入2个文件内容

StringFile = load 'String' using PigStorage(',') as (name,branch,div); -- string values
NumFile = load 'num' using PigStorage(',') as (id,m1,m2,m3,m4); -- numeric values
joined = join id by name,(m1,m2) by branch,div by (m3,m4);
store joined into 'joinedfile' using PigStorage(',');

但显示

[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <file filterjoin.pig, line 4, column 14>  Syntax error, unexpected symbol at or near '('
  

Anju,IT,A --stringFile

     

1,5.3,3.6,1.6,0.3 - numFile

尝试输出

1,Anju,5.3,3.6,IT,A,1.6,0.3

我做错了吗?

从教科书

  

您还可以加入多个密钥。在所有情况下,你必须拥有   相同数量的密钥,它们必须是相同或兼容的类型   (兼容意味着可以插入隐式强制转换

1. It should be same number of keys?

    id by name
    (m1,m2) by branch
    div by (m3,m4)
Is this not possible?

2. while joining, the datatype should be same?

1 个答案:

答案 0 :(得分:1)

我认为你误解了join的作用。它通过公共元素连接两个数据集。所以语法是:

C = join A by a1, B by b1;

其中a1和b1是各自关系的字段,它们也有注释元素。

示例:

students = 
1 rob
2 john 
3 fred

gpas =  
1 3.2 
2 3.8 
3 4.0

A = join students by id, gpas by id;

A =  
1 rob 1 3.2 
2 john 2 3.8 
3 fred 3 4.0