我在变量data_1
describe data_1;
output:
group_2: {group: (age: int,phone: chararray),group_1: {(group: (age: int,phone: chararray,id: int),student_details: {(id: int,firstname: chararray,lastname: chararray,age: int,phone: chararray,city: chararray)})}}
和
DUMP data_1;
output:
(21,9848022330) {((21,9848022330,4),{(4,Preethi,Agarwal,21,9848022330,London)})}
(21,9848022337) {((21,9848022337,1),{(1,Rajiv,Reddy,21,9848022337,Paris)})}
(22,9848022338) {((22,9848022338,2),{(2,siddarth,Battacharya,22,9848022338,Kolkata)})}
(22,9848022339) {((22,9848022339,3),{(3,Rajesh,Khanna,22,9848022339,Delhi)})}
(23,9848022335) {((23,9848022335,6),{(6,Archana,Mishra,23,9848022335,Chennai)})}
(23,9848022336) {((23,9848022336,5),{(5,Trupthi,Mohanthy,23,9848022336,Bhuwaneshwar)})}
(24,9848022333) {((24,9848022333,7),{(7,Komal,Nayak,24,9848022333,trivendram)}),((24,9848022333,8),{(8,Bharathi,Nambiayar,24,9848022333,Chennai)})}
(111,9834534343) {((111,9834534343,9),{(9,ABC,DEF,111,9834534343,Delhi1),(9,ABC,DEF,111,9834534343,Delhi2),(9,ABC,DEF,111,9834534343,Delhi3)})}
我想删除额外的bag.tuple&只能使用$ 1. $ 1包。
我试图通过使用类似group_2_normal = FOREACH data_1 GENERATE $0.age,$0.phone,$1.$1;
之类的东西来实现这一目标但是我仍然无法移除围绕$ 1. $ 1包的额外包和元组。
上述foreach命令的输出为:
21 9848022330 {({(4,Preethi,Agarwal,21,9848022330,London)})}
21 9848022337 {({(1,Rajiv,Reddy,21,9848022337,Paris)})}
但期望的输出是:
21 9848022330 {(4,Preethi,Agarwal,21,9848022330,London)}
21 9848022337 {(1,Rajiv,Reddy,21,9848022337,Paris)}
答案 0 :(得分:0)
我认为使用FLATTEN会对你有所帮助。只要你的包里只有一排,它就会给你你想要的东西。
group_2_normal = FOREACH data_1 GENERATE $0.age,$0.phone,FLATTEN($1.$1);