在猪中使用元组

时间:2017-06-26 17:03:13

标签: hadoop apache-pig

我是猪的初学者并试图了解元组数据类型,我有一个文件如下:

       cat student.csv
id,name,grade,contact_details 
s1234,Mohan,8,(Delhi,9811830)
s2345,Nisha,10,(Delhi,257891)
s3456,Anuj,12,(Delhi,9897212)
s4567,vishal,14,(Delhi,989175)

联系方式是由城市和电话组成的元组:

我已按照以下关系加载它:

    student = load 'student.csv' using PigStorage(',') as
 (id:chararray,
  name:chararray,
  grade:int,
  contact: tuple(city:chararray,phone:chararray));

现在当我尝试转储结果时,我没有在输出中得到我的元组,下面是dump_student的输出:

grunt> dump student; 
(s1234,Mukul,8,)
(s2345,Nikita,10,)
(s3456,Anuj,12,)
(s4567,vishu,14,)
grunt> 

grunt> describe student;
student: {id: chararray,name: chararray,grade: int,contact: (city: chararray,phone: chararray)}
我错过了什么吗?

1 个答案:

答案 0 :(得分:0)

你正在使用的分隔符''导致文件加载错误,因为','也存在于元组中。除了元组内的字段之间替换','或只是将字段加载到5个字段并替换'('和')'和concat以从城市和电话字段获取contact_details。

选项1:使用''作为分隔符

id name grade contact_details 

s1234 Mohan 8 (Delhi,9811830)
s2345 Nisha 10 (Delhi,257891)
s3456 Anuj 12 (Delhi,9897212)
s4567 vishal 14 (Delhi,989175)

student = load 'student.csv' using PigStorage(' ') as (id:chararray, name:chararray,  grade:int,  contact: tuple(city:chararray,phone:chararray));

选项2:使用','作为分隔符

id,name,grade,contact_details 

s1234,Mohan,8,(Delhi,9811830)
s2345,Nisha,10,(Delhi,257891)
s3456,Anuj,12,(Delhi,9897212)
s4567,vishal,14,(Delhi,989175)


student = load 'student.csv' using PigStorage(',') as (id:chararray, name:chararray,  grade:int,  city:chararray,phone:chararray);
student_new = FOREACH A GENERATE id,name,grade,CONCAT(REPLACE(CONCAT(city,' '),'(',''),REPLACE(phone,')','')) AS contact_details;