我有一个非常大的输入文件,数据格式如下:
id1 id2 id3 id4
文件非常大,大约1000W线。
我写的猪脚本是:A = load '/input' using PigStorage(' ');
B = foreach A generate $2 as id3;
id_group = GROUP B BY id3;
count_id = FOREACH id_group GENERATE group, COUNT(B.id3);
Store count_id INTO 'statistic';
当文件很小时,猪脚本成功,但是当我使用大输入时,猪脚本失败了。 它显示:
2013-10-10 23:25:01,655 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1% complete
2013-10-10 23:25:05,686 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 3% complete
2013-10-10 23:27:52,894 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201309291007_0201 has failed! Stop running all dependent jobs
2013-10-10 23:27:52,894 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2013-10-10 23:27:52,916 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2013-10-10 23:27:52,918 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
1.2.1 0.10.0 bhbz 2013-10-10 23:24:48 2013-10-10 23:27:52 GROUP_BY
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_201309291007_0201 A,B,count_id,id_group GROUP_BY,COMBINER Message: Job failed! Error - NA hdfs://h1061.mzhen.cn:9000/user/bhbz/statistic1,
Input(s):
Failed to read data from "/dataSet/public.mbm.3.0"
Output(s):
Failed to produce result in "hdfs://h1061.mzhen.cn:9000/user/bhbz/statistic1"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201309291007_0201
2013-10-10 23:27:52,918 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2013-10-10 23:27:52,932 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job failed, hadoop does not return any error message
是因为“GROUP”使用了太多内存吗?