在以下Pig脚本中,当我在执行设置ct
别名的生成后的任何步骤上运行DUMP时,我的值e3
“消失”。例如,如果我在设置别名后立即在DUMP
上执行e4
,则不会返回任何值。
我的输出中也会看到以下警告:
[主要]警告 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 遇到警告ACCESSING_NON_EXISTENT_FIELD 9次。
eng_grp = GROUP engs BY (aid, scm_id,ts,etype);
eng_grp_out = FOREACH eng_grp
GENERATE
group.aid as aid,
group.scm_id as scm_id,
group.etype as etype,
group.ts as timestamp,
(long)COUNT_STAR(engs) as ct;
eng_joined = JOIN eng_grp_out BY (aid,scm_id), tgc BY (aid, scm_id);
e3 = FOREACH eng_joined GENERATE
MD5((chararray)CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(eng_grp_out::aid,'_'),eng_grp_out::scm_id),'_'),eng_grp_out::etype),'_'),(chararray)eng_grp_out::timestamp)) as id,
eng_grp_out::aid as v,
eng_grp_out::scm_id as scmid,
eng_grp_out::etype AS et,
eng_grp_out::timestamp as ts,
FLATTEN(tgc::tags),
eng_grp_out::ct as ct;
-- the value for "ct" will be output if I do DUMP e3; here
e4 = FOREACH e3 GENERATE
id,
v,
scmid,
et,
ts,
FLATTEN(tgc::tags::g) as gg,
ct;
-- the value for "ct" will be NOT be output if I do DUMP e4; here
e5 = FOREACH e4 GENERATE
id,
v,
scmid,
et,
ts,
gg#'g' as tg,
gg#'v' as tv,
gg#'d' as td,
ct;
e6 = FOREACH e5 GENERATE
id,
v,
scmid,
et,
(long)ts,
tg#'\$oid' as tg,
tv#'\$oid' as tv,
(chararray)td as td,
ct;
e7 = FOREACH e6 GENERATE
id,
v,
scmid,
et,
ts,
'c' as tt,
tg,
tv,
td,
ct;
e8 = FOREACH e7 GENERATE
id,v,scmid,et,ts,tt,
CONCAT(CONCAT(CONCAT(CONCAT(tg,'_'),tv),'_'),td) as ct,
tg,tv,td,ct;
答案 0 :(得分:0)
我能够通过将e3
别名的分配更改为
e3 = FOREACH eng_joined GENERATE
//...kept everything else the same...
TOMAP('count_val', (long)eng_grp_out::ct);
从那里我可以通过e4
获得(long)$6#'count_val' as val
作业中所需的价值。