例如,我有这些数据:
x, 23
y, 492
v, 2034
x, 45
z, 25
v, 29
我想转变成:
x, 23, 45
y, 492
v, 2034, 29
z, 25
它将等同于打印的哈希表。
这是我目前的剧本:
logs = LOAD 'tmp' using MyLoader (Parameters) as
(x:bytearray, y:bytearray, z, x1, y1:bytearray, z1:long, x2:bytearray,
z2:bytearray, z3:bytearray, z4:float, dataMap:map[],
recs:bag{(record:bytearray)}, key:bytearray, colo:bytearray);
filtered_logs = foreach logs {
info = FILTER records BY record MATCHES 'FIRST_REGEX';
info_records = FOREACH info GENERATE GET_FIELDS($0) as
rec:tuple(mClass:bytearray, rType:bytearray,
rName:bytearray, rStatus:bytearray, rDuration:float,
rData:bytearray, rDataMap:map[]);
name = FOREACH info_records GENERATE rec.rName;
matching_requests = FILTER records BY record MATCHES 'SECOND_REGEX';
GENERATE FLATTEN(client_name) as client_name:chararray,
dataMap#'corr_id_', (SIZE(matching_requests) > 0 ? true : false)
as matched:boolean;
}
A = FILTER filtered_logs BY matched;
key_corr_id = foreach A generate (chararray) $1 as key, (chararray) $2 as corr_id;
id_group = group key_corr_id by key; -- ERROR thrown when this line is included.
STORE id_group into '$output' using
org.apache.pig.piggybank.storage.CSVExcelStorage(, 'YES_MULTILINE');
抛出错误:
java.lang.ClassCastException: org.apache.pig.data.DataByteArrayString cannot be cast to java.lang.String
答案 0 :(得分:1)
无需创建新关系和join.Just group by the key并转储关系。
key_corr_id = foreach A generate (chararray) $1 as key:chararray, (chararray) $2 as corr_id:chararray;
id_group = group key_corr_id by key;
dump id_group;
现在,如果您不希望元组对密钥x,{(23),(45)}说,但希望项目分隔为x,23,45,那么在分组中的corr_id上添加另一个步骤以使用BagToString像这样
final = foreach id_group generate key,BagToString(A.$1, ',');
dump final;