我是猪的新手。我有以下输出。
(001,Kumar,Jayasuriya,1123456754,Matara)
(001,Kumar,Sangakkara,112722892,Kandy)
(001,Rajiv,Reddy,9848022337,Hyderabad)
(002,siddarth,Battacharya,9848022338,Kolkata)
(003,Rajesh,Khanna,9848022339,Delhi)
(004,Preethi,Agarwal,9848022330,Pune)
(005,Trupthi,Mohanthy,9848022336,Bhuwaneshwar)
(006,Archana,Mishra,9848022335,Chennai)
(007,Kumar,Dharmasena,758922419,Colombo)
(008,Mahela,Jayawerdana,765557103,Colombo)
如何创建上述地图以使输出看起来像
001#{(Kumar,Jayasuriya,1123456754,Matara),(Kumar,Sangakkara,112722892,Kandy),(001,Rajiv,Reddy,9848022337,Hyderabad)}
002#{(siddarth,Battacharya,9848022338,Kolkata)}
我尝试了ToMap功能。
mapped_students = FOREACH students GENERATE TOMAP($0,$1..);
但是我无法转储上面命令的输出,因为进程会抛出错误并在那里停止。任何帮助将不胜感激。
答案 0 :(得分:0)
我认为你试图将组记录转换为具有相同id的元组。
根据TOMAP函数,它将键/值表达式对转换为映射,因此您无法对其余记录进行分组,并且会导致无法打开别名的迭代器等。
根据您的需求输出,这是一段代码。
A = LOAD 'path_to_data/data.txt' USING PigStorage(',') AS (id:chararray,first:chararray,last:chararray,phone:chararray,city:chararray);
如果您不想提供架构,那么:
A = LOAD 'path_to_data/data.txt' USING PigStorage(',');
B = GROUP A BY $0; (this relation will group all your records based on your first column)
DESCRIBE B; (this will show your described schema)
DUMP B;
希望这会有所帮助..