我想通过pig-latin脚本完成的是在col 4中填充值“JKL”,“PQR”等等,其余行为空白。空白行必须仅复制第4列中上一个单元格中的值。请查看下面的示例。
目标表应该是这样的:
答案 0 :(得分:1)
如果您要求将所有值为null或为空的记录的Col4值更新为XYZ,则可以使用以下代码段执行相同的操作
--Load input data
input_data = LOAD 'input.txt' USING PigStorage() AS (Col1:chararray, Col2:int, Col3:int, Col4:chararray);
--Perform operation on each record
input_data = FOREACH input_data GENERATE Col1, Col2, Col3, ((Col4 is null or TRIM(Col4) == '') ? 'XYZ' : Col4) as Col4;
这里假设您持有input_data然后为每个记录检查Col4值是空还是空,如果是,则用期望值(XYZ)更新它,否则只使用现有值
答案 1 :(得分:0)
Col1对于所有行是否相同。如果是,则使用两组过滤器,否则你必须找到col1和amp;之间的uniq值。 Col4并删除步骤
下面的NULL值Filter_One将捕获Col1& Col4其中Col4不为NULL
Filter_Two将捕获Col1,Col2,Col3。使用加入Filter_one&
Filter_Two,其中Filter_two将被打印第1,第2,第3列
和Filter_one第二列将在第四位,
希望同样有帮助
Pig脚本将如下:
Filter_one = foreach Load_Data generate $0 as col1, $3 as col4;
Filter_one_temp = filter Filter_one by ($1 is not null);
Filter_two = foreach Load_Data generate $0 as col1, $1 as col2, $2 as col3;
Join_filter = JOIN Filter_two by $0 LEFT, Filter_one_temp by $0;
generetate_output = foreach Join_filter generate $0 as col1, $1 as col2 , $2 as col3,$4 as col4;
store generetate_output into 'dfs_path' using PigStorage(',');
因为存储相同的,分隔符所以输出将像
(ABC,34,23,XYZ)
(ABC,12,78,XYZ)
(ABC,4,21,XYZ)
(ABC,22,54,XYZ)
(DEF,32,455,JKL)
(DEF,21,45,JKL)
(DEF,45,687,JKL)
(DEF,232,565,JKL)
(DEF,23,32,JKL)