我使用8个地图任务和1个减少任务。尽管已成功完成所有映射任务尝试,但map reduce作业失败。我的示例代码来自为跳过数据运行的Hadoop初学者指南(Garry Turkington)。程序的主要思想是测试地图中的任务失败减少。虽然导致失败的数据(示例中为skiptext)具有源文件,但map reduce可以成功完成工作。但是,我没有完成工作,遇到工作失败。我该怎么办?
完整的源代码是:
import java.io.IOException;
import org.apache.hadoop.conf.* ;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.* ;
import org.apache.hadoop.mapred.* ;
import org.apache.hadoop.mapred.lib.* ;
public class SkipData
{
public static class MapClass extends MapReduceBase
implements Mapper<LongWritable, Text, Text, LongWritable>
{
private final static LongWritable one = new
LongWritable(1);
private Text word = new Text("totalcount");
public void map(LongWritable key, Text value,
OutputCollector<Text, LongWritable> output,
Reporter reporter) throws IOException
{
String line = value.toString();
if (line.equals("skiptext"))
throw new RuntimeException("Found skiptext") ;
output.collect(word, one);
}
}
public static void main(String[] args) throws Exception
{
Configuration config = new Configuration() ;
JobConf conf = new JobConf(config, SkipData.class);
conf.setJobName("SkipData");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(LongWritable.class);
conf.setMapperClass(MapClass.class);
conf.setCombinerClass(LongSumReducer.class);
conf.setReducerClass(LongSumReducer.class);
FileInputFormat.setInputPaths(conf,args[0]) ;
FileOutputFormat.setOutputPath(conf, new
Path(args[1])) ;
JobClient.runJob(conf);
}
}
完整的错误控制台是:
18/02/28 21:12:58 INFO mapreduce.Job: Job job_local724352166_0001 failed with state FAILED due to: NA
18/02/28 21:12:58 WARN mapred.LocalJobRunner: job_local724352166_0001
java.lang.Exception: java.lang.RuntimeException: Found skiptext
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: Found skiptext
at mapredpack.SkipTest$MapClass.map(SkipTest.java:23)
at mapredpack.SkipTest$MapClass.map(SkipTest.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner .java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
18/02/28 21:12:58 DEBUG security.UserGroupInformation: PrivilegedAction as:naychi (auth:SIMPLE) from:org.apache.hadoop.mapreduce.Job.getCounters(Job.java:758)
18/02/28 21:12:59 DEBUG security.UserGroupInformation: PrivilegedAction as:naychi (auth:SIMPLE) from:org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:331)
18/02/28 21:12:59 INFO mapreduce.Job: Counters: 23
File System Counters
FILE: Number of bytes read=29905
FILE: Number of bytes written=2020669
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=128005127
HDFS: Number of bytes written=0
HDFS: Number of read operations=80
HDFS: Number of large read operations=0
HDFS: Number of write operations=7
Map-Reduce Framework
Map input records=1542671
Map output records=1542669
Map output bytes=29310711
Map output materialized bytes=135
Input split bytes=686
Combine input records=1161148
Combine output records=5
Spilled Records=5
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=8601
Total committed heap usage (bytes)=3840933888
File Input Format Counters
Bytes Read=23163911
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:873)
at mapredpack.SkipTest.main(SkipTest.java:58)
18/02/28 21:12:59 DEBUG ipc.Client: stopping client from cache: org.apache.hadoop.ipc.Client@2e55dd0c
18/02/28 21:12:59 DEBUG ipc.Client: removing client from cache: org.apache.hadoop.ipc.Client@2e55dd0c
18/02/28 21:12:59 DEBUG ipc.Client: stopping actual client because no more references remain: org.apache.hadoop.ipc.Client@2e55dd0c
18/02/28 21:12:59 DEBUG ipc.Client: Stopping client
18/02/28 21:12:59 DEBUG ipc.Client: IPC Client (1313916817) connection to localhost/127.0.0.1:9000 from naychi: closed
18/02/28 21:12:59 DEBUG ipc.Client: IPC Client (1313916817) connection to localhost/127.0.0.1:9000 from naychi: stopped, remaining connections 0
答案 0 :(得分:1)
看起来代码按设计工作。找到了skiptext行,并且实现了该作业以在该情况下抛出任务结束异常。这是一种常见的编码技术,迫使人们在某一点实现逻辑。抛出RuntimeException(),需要修改代码,开发人员不得不查看代码的那一部分。
查看代码并确定在skiptext行的情况下您想要做什么。是否需要实现其他逻辑,替换异常?如果是,则用正确的行为替换抛出的异常。