为什么在JOIN和设置别名后收到ACCESSING_NON_EXISTENT_FIELD警告?

时间:2015-01-22 06:13:30

标签: hadoop apache-pig

在以下Pig脚本中,当我在执行设置ct别名的生成后的任何步骤上运行DUMP时,我的值e3“消失”。例如,如果我在设置别名后立即在DUMP上执行e4,则不会返回任何值。

我的输出中也会看到以下警告:

  

[主要]警告   org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher    - 遇到警告ACCESSING_NON_EXISTENT_FIELD 9次。

   eng_grp = GROUP engs BY (aid, scm_id,ts,etype);
   eng_grp_out = FOREACH eng_grp
               GENERATE
                   group.aid as aid,
                   group.scm_id as scm_id,
                   group.etype as etype,
                   group.ts as timestamp,
                   (long)COUNT_STAR(engs) as ct;

   eng_joined = JOIN eng_grp_out BY (aid,scm_id), tgc BY (aid, scm_id);

   e3 = FOREACH eng_joined GENERATE
         MD5((chararray)CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(eng_grp_out::aid,'_'),eng_grp_out::scm_id),'_'),eng_grp_out::etype),'_'),(chararray)eng_grp_out::timestamp)) as id,
         eng_grp_out::aid as v,
         eng_grp_out::scm_id as scmid,
         eng_grp_out::etype AS et,
         eng_grp_out::timestamp as ts,
         FLATTEN(tgc::tags),
         eng_grp_out::ct as ct;

   -- the value for "ct" will be output if I do DUMP e3; here

   e4 = FOREACH e3 GENERATE
         id,
         v,
         scmid,
         et,
         ts,
         FLATTEN(tgc::tags::g) as gg,
         ct;
   -- the value for "ct" will be NOT be output if I do DUMP e4; here
   e5 = FOREACH e4 GENERATE
         id,
         v,
         scmid,
         et,
         ts,
         gg#'g' as tg,
         gg#'v' as tv,
         gg#'d' as td,
         ct;

   e6 = FOREACH e5 GENERATE
         id,
         v,
         scmid,
         et,
         (long)ts,
         tg#'\$oid' as tg,
         tv#'\$oid' as tv,
         (chararray)td as td,
         ct;

   e7 = FOREACH e6 GENERATE
         id,
         v,
         scmid,
         et,
         ts,
         'c' as tt,
         tg,
         tv,
         td,
         ct;

   e8 = FOREACH e7 GENERATE
         id,v,scmid,et,ts,tt,
         CONCAT(CONCAT(CONCAT(CONCAT(tg,'_'),tv),'_'),td) as ct,
         tg,tv,td,ct;

1 个答案:

答案 0 :(得分:0)

我能够通过将e3别名的分配更改为

来最终使其工作

e3 = FOREACH eng_joined GENERATE //...kept everything else the same... TOMAP('count_val', (long)eng_grp_out::ct);

从那里我可以通过e4获得(long)$6#'count_val' as val作业中所需的价值。