我的MapReduce作业中没有调用Reducer任务

时间:2017-03-08 08:25:34

标签: scala hadoop mapreduce

这是一个字数统计减少工作。我有自己的InputFormat。

个JobExecutor:

val job = new Job(new Configuration())

job.setMapperClass(classOf[CountMapper])
job.setReducerClass(classOf[CountReducer])

job.setJobName("tarun-test-1")
job.setInputFormatClass(classOf[MyInputFormat])
FileInputFormat.setInputPaths(job, new Path(args(0)))
FileOutputFormat.setOutputPath(job, new Path(args(1)))

job.setOutputKeyClass(classOf[Text])
job.setOutputValueClass(classOf[LongWritable])

job.setNumReduceTasks(1)

println("status: " + job.waitForCompletion(true))

映射器:

class CountMapper extends Mapper[LongWritable, Text, Text, LongWritable] {

    private val valueOut = new LongWritable(1L)

    override def map(k: LongWritable, v: Text, context: Mapper[LongWritable, Text, Text, LongWritable]#Context): Unit = {
        val str = v.toString
        str.split(",").foreach(word => {
            val keyOut = new Text(word.toLowerCase.trim)
            context.write(keyOut, valueOut)
        })
    }
}

减速机:

class CountReducer extends Reducer[Text, LongWritable, Text, LongWritable] {

    override def reduce(k: Text, values: Iterable[LongWritable], context: Reducer[Text, LongWritable, Text, LongWritable]#Context): Unit = {
        println("Inside reduce method..")
        val valItr = values.iterator()
        var sum = 0L
        while (valItr.hasNext) {
            sum = sum + valItr.next().get()
        }

        context.write(k, new LongWritable(sum))
        println("done reducing.")
    }
}

正在调用Mapper,RecordReader正在根据日志正确读取拆分。但是,还没有调用reducer。

1 个答案:

答案 0 :(得分:0)

尝试设置: job.mapOutputKeyClass和job.MapOutputValueClass。