以下是我的要求:
输入:
0104919 ,08476,48528,2016,2016-08-29
00104919 ,08476,48528,2016,2016-09-05
00104919 ,08476,48528,2016,2016-09-12
00104919 ,08476,48528,2017,2016-08-29
加入后的输出应为:
2,00104919 ,08476,48528,2016,2016-09-05,2016-09-12
3,00104919 ,08476,48528,2016,2016-09-12,2016-08-29
以下是我的代码:
TABL = LOAD '/TABL/part-r-00000' using PigStorage('~') AS (a,b,c,d,e,f);
pre_Q1 = FOREACH TABL generate a,b,c,d,e;
DIST = DISTINCT pre_Q1;
ORDR = ORDER DIST BY *;
Q1 = rank ORDR;
Q2 = FOREACH Q1 GENERATE rank_ORDR + 1 AS rank_Q2, a, b, c, d, e;
Q_join = join Q2 by (rank_Q2, a, b, c, d), Q1 by (rank_ORDR, a, b, c, d);
C = limit Q_join 100;
dump C;
我收到以下错误。 有人可以指出导致以下错误的原因。
Failed Jobs:
JobId Alias Feature Message Outputs
job_1474127474437_528208 C,Q2,Q_join HASH_JOIN Message: Job failed!
Input(s):
Successfully read 5235587 records (1516199217 bytes) from: "/TABL/part-r-00000"
Output(s):
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1474127474437_528166 -> job_1474127474437_528185,
job_1474127474437_528185 -> job_1474127474437_528190,
job_1474127474437_528190 -> job_1474127474437_528204,
job_1474127474437_528204 -> job_1474127474437_528206,
job_1474127474437_528206 -> job_1474127474437_528208,
job_1474127474437_528208 -> null,
null
2017-01-04 04:02:37,407 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-01-04 04:02:37,569 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-01-04 04:02:37,729 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-01-04 04:02:37,887 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2017-01-04 04:02:37,945 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
2017-01-04 04:02:37,945 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias C
Details at logfile: /var/log/gphd/pig/pig.log
答案 0 :(得分:0)
尝试修改第一行,如下所示:
TABL = LOAD '/TABL/part-r-00000' using PigStorage(',') AS (a,b,c,d,e,f);
请注意列space
列末尾的a
,这可能会影响合作!