Question

我有一个简单的mapreduce作业，它使用默认的mapper和reducer。输入是一些文本文件。我在伪分布式模式下使用Hadoop 2.x。

我担心的是即使我设置mapred.reduce.tasks=2，仍然只调用一个reducer。

package org.priya.sort;

import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.filecache.DistributedCache;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.partition.InputSampler;
import org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class TestingReduce  extends Configured implements Tool  {

    @Override
    public int run(String[] arg0) throws Exception {
        System.out.println("###########I am in TestingReduce###########");
        Job job = Job.getInstance(getConf());
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        job.setJarByClass(TestingReduce.class);
        System.out.println("#########The number of reducers :: " +job.getNumReduceTasks());
        FileInputFormat.addInputPath(job, new Path("/input"));
        FileOutputFormat.setOutputPath(job, new Path("/totalOrderOutput"));
        return job.waitForCompletion(true ) ? 0 :1 ;
    }

    public static void main(String args[]) throws Exception {
        int i = ToolRunner.run(new TestingReduce(), args) ;
        System.out.println("Retun value is " + i);
    }
}

我使用以下命令来运行此作业

hadoop jar TestingReducer.jar -D mapred.reduce.tasks=2

###########I am in TestingReduce###########
OpenJDK 64-Bit Server VM warning: You have loaded library /home/priya/workspace/hadoop-2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
14/07/06 15:24:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
#########The number of reducers :: 2
14/07/06 15:24:48 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
14/07/06 15:24:48 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
14/07/06 15:24:49 INFO input.FileInputFormat: Total input paths to process : 3
14/07/06 15:24:50 INFO mapreduce.JobSubmitter: number of splits:3
14/07/06 15:24:50 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/07/06 15:24:50 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/07/06 15:24:50 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/07/06 15:24:50 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/07/06 15:24:50 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
14/07/06 15:24:50 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/07/06 15:24:50 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/07/06 15:24:50 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
14/07/06 15:24:50 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/07/06 15:24:50 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/07/06 15:24:51 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1851811203_0001
14/07/06 15:24:51 WARN conf.Configuration: file:/home/priya/hdfs-tmp/mapred/staging/priya1851811203/.staging/job_local1851811203_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
14/07/06 15:24:51 WARN conf.Configuration: file:/home/priya/hdfs-tmp/mapred/staging/priya1851811203/.staging/job_local1851811203_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
14/07/06 15:24:52 WARN conf.Configuration: file:/home/priya/hdfs-tmp/mapred/local/localRunner/priya/job_local1851811203_0001/job_local1851811203_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
14/07/06 15:24:52 WARN conf.Configuration: file:/home/priya/hdfs-tmp/mapred/local/localRunner/priya/job_local1851811203_0001/job_local1851811203_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
14/07/06 15:24:52 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
14/07/06 15:24:52 INFO mapreduce.Job: Running job: job_local1851811203_0001
14/07/06 15:24:52 INFO mapred.LocalJobRunner: OutputCommitter set in config null
14/07/06 15:24:52 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
14/07/06 15:24:53 INFO mapred.LocalJobRunner: Waiting for map tasks
14/07/06 15:24:53 INFO mapred.LocalJobRunner: Starting task: attempt_local1851811203_0001_m_000000_0
14/07/06 15:24:53 INFO mapreduce.Job: Job job_local1851811203_0001 running in uber mode : false
14/07/06 15:24:53 INFO mapreduce.Job:  map 0% reduce 0%
14/07/06 15:24:53 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
14/07/06 15:24:53 INFO mapred.MapTask: Processing split: hdfs://localhost/input/2.txt:0+15
14/07/06 15:24:53 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
14/07/06 15:24:53 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
14/07/06 15:24:53 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
14/07/06 15:24:53 INFO mapred.MapTask: soft limit at 83886080
14/07/06 15:24:53 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
14/07/06 15:24:53 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
14/07/06 15:24:54 INFO mapred.LocalJobRunner: 
14/07/06 15:24:54 INFO mapred.MapTask: Starting flush of map output
14/07/06 15:24:54 INFO mapred.MapTask: Spilling map output
14/07/06 15:24:54 INFO mapred.MapTask: bufstart = 0; bufend = 79; bufvoid = 104857600
14/07/06 15:24:54 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214368(104857472); length = 29/6553600
14/07/06 15:24:54 INFO mapred.MapTask: Finished spill 0
14/07/06 15:24:54 INFO mapred.Task: Task:attempt_local1851811203_0001_m_000000_0 is done. And is in the process of committing
14/07/06 15:24:54 INFO mapred.LocalJobRunner: map
14/07/06 15:24:54 INFO mapred.Task: Task 'attempt_local1851811203_0001_m_000000_0' done.
14/07/06 15:24:54 INFO mapred.LocalJobRunner: Finishing task: attempt_local1851811203_0001_m_000000_0
14/07/06 15:24:54 INFO mapred.LocalJobRunner: Starting task: attempt_local1851811203_0001_m_000001_0
14/07/06 15:24:54 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
14/07/06 15:24:54 INFO mapred.MapTask: Processing split: hdfs://localhost/input/1.txt:0+10
14/07/06 15:24:54 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
14/07/06 15:24:54 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
14/07/06 15:24:54 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
14/07/06 15:24:54 INFO mapred.MapTask: soft limit at 83886080
14/07/06 15:24:54 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
14/07/06 15:24:54 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
14/07/06 15:24:54 INFO mapreduce.Job:  map 100% reduce 0%
14/07/06 15:24:54 INFO mapred.LocalJobRunner: 
14/07/06 15:24:54 INFO mapred.MapTask: Starting flush of map output
14/07/06 15:24:54 INFO mapred.MapTask: Spilling map output
14/07/06 15:24:54 INFO mapred.MapTask: bufstart = 0; bufend = 50; bufvoid = 104857600
14/07/06 15:24:54 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214380(104857520); length = 17/6553600
14/07/06 15:24:54 INFO mapred.MapTask: Finished spill 0
14/07/06 15:24:54 INFO mapred.Task: Task:attempt_local1851811203_0001_m_000001_0 is done. And is in the process of committing
14/07/06 15:24:54 INFO mapred.LocalJobRunner: map
14/07/06 15:24:54 INFO mapred.Task: Task 'attempt_local1851811203_0001_m_000001_0' done.
14/07/06 15:24:54 INFO mapred.LocalJobRunner: Finishing task: attempt_local1851811203_0001_m_000001_0
14/07/06 15:24:54 INFO mapred.LocalJobRunner: Starting task: attempt_local1851811203_0001_m_000002_0
14/07/06 15:24:54 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
14/07/06 15:24:54 INFO mapred.MapTask: Processing split: hdfs://localhost/input/3.txt:0+10
14/07/06 15:24:54 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
14/07/06 15:24:54 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
14/07/06 15:24:54 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
14/07/06 15:24:54 INFO mapred.MapTask: soft limit at 83886080
14/07/06 15:24:54 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
14/07/06 15:24:54 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
14/07/06 15:24:55 INFO mapred.LocalJobRunner: 
14/07/06 15:24:55 INFO mapred.MapTask: Starting flush of map output
14/07/06 15:24:55 INFO mapred.MapTask: Spilling map output
14/07/06 15:24:55 INFO mapred.MapTask: bufstart = 0; bufend = 50; bufvoid = 104857600
14/07/06 15:24:55 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214380(104857520); length = 17/6553600
14/07/06 15:24:55 INFO mapred.MapTask: Finished spill 0
14/07/06 15:24:55 INFO mapred.Task: Task:attempt_local1851811203_0001_m_000002_0 is done. And is in the process of committing
14/07/06 15:24:55 INFO mapred.LocalJobRunner: map
14/07/06 15:24:55 INFO mapred.Task: Task 'attempt_local1851811203_0001_m_000002_0' done.
14/07/06 15:24:55 INFO mapred.LocalJobRunner: Finishing task: attempt_local1851811203_0001_m_000002_0
14/07/06 15:24:55 INFO mapred.LocalJobRunner: Map task executor complete.
14/07/06 15:24:55 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
14/07/06 15:24:55 INFO mapred.Merger: Merging 3 sorted segments
14/07/06 15:24:55 INFO mapred.Merger: Down to the last merge-pass, with 3 segments left of total size: 191 bytes
14/07/06 15:24:55 INFO mapred.LocalJobRunner: 
14/07/06 15:24:55 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
14/07/06 15:24:55 INFO mapred.Task: Task:attempt_local1851811203_0001_r_000000_0 is done. And is in the process of committing
14/07/06 15:24:55 INFO mapred.LocalJobRunner: 
14/07/06 15:24:55 INFO mapred.Task: Task attempt_local1851811203_0001_r_000000_0 is allowed to commit now
14/07/06 15:24:55 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1851811203_0001_r_000000_0' to hdfs://localhost/totalOrderOutput/_temporary/0/task_local1851811203_0001_r_000000
14/07/06 15:24:55 INFO mapred.LocalJobRunner: reduce > reduce
14/07/06 15:24:55 INFO mapred.Task: Task **'attempt_local1851811203_0001_r_000000_0'** done.
14/07/06 15:24:56 INFO mapreduce.Job:  map 100% reduce 100%
14/07/06 15:24:56 INFO mapreduce.Job: Job job_local1851811203_0001 completed successfully
14/07/06 15:24:56 INFO mapreduce.Job: Counters: 32
    File System Counters
        FILE: Number of bytes read=21871
        FILE: Number of bytes written=768178
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=110
        HDFS: Number of bytes written=74
        HDFS: Number of read operations=37
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=6
    Map-Reduce Framework
        Map input records=18
        Map output records=18
        Map output bytes=179
        Map output materialized bytes=233
        Input split bytes=279
        Combine input records=0
        Combine output records=0
        Reduce input groups=8
        Reduce shuffle bytes=0
        Reduce input records=18
        Reduce output records=18
        Spilled Records=36
        Shuffled Maps =0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=54
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0
        Total committed heap usage (bytes)=1372061696
    File Input Format Counters 
        Bytes Read=35
    File Output Format Counters 
        Bytes Written=74
Retun value is 0

即使我将减速器的数量设置为2，仍然只创建一个减速器。

Answer 1

原因是您以本地模式运行。

您可以拥有look at the source code of the LocalJobRunner：

int numReduceTasks = job.getNumReduceTasks();
if (numReduceTasks > 1 || numReduceTasks < 0) {
   // we only allow 0 or 1 reducer in local mode
   numReduceTasks = 1;
   job.setNumReduceTasks(1);
}

要更改为伪分布式模式，您需要配置：

mapreduce.framework.name = yarn

您目前将此设置为local。

mapred.reduce.tasks没有按预期工作

1 个答案: