我有一个简单的mapreduce作业,它使用默认的mapper和reducer。输入是一些文本文件。我在伪分布式模式下使用Hadoop 2.x。
我担心的是即使我设置mapred.reduce.tasks=2
,仍然只调用一个reducer。
package org.priya.sort;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.filecache.DistributedCache;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.partition.InputSampler;
import org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class TestingReduce extends Configured implements Tool {
@Override
public int run(String[] arg0) throws Exception {
System.out.println("###########I am in TestingReduce###########");
Job job = Job.getInstance(getConf());
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setJarByClass(TestingReduce.class);
System.out.println("#########The number of reducers :: " +job.getNumReduceTasks());
FileInputFormat.addInputPath(job, new Path("/input"));
FileOutputFormat.setOutputPath(job, new Path("/totalOrderOutput"));
return job.waitForCompletion(true ) ? 0 :1 ;
}
public static void main(String args[]) throws Exception {
int i = ToolRunner.run(new TestingReduce(), args) ;
System.out.println("Retun value is " + i);
}
}
我使用以下命令来运行此作业
hadoop jar TestingReducer.jar -D mapred.reduce.tasks=2
###########I am in TestingReduce###########
OpenJDK 64-Bit Server VM warning: You have loaded library /home/priya/workspace/hadoop-2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
14/07/06 15:24:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
#########The number of reducers :: 2
14/07/06 15:24:48 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
14/07/06 15:24:48 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
14/07/06 15:24:49 INFO input.FileInputFormat: Total input paths to process : 3
14/07/06 15:24:50 INFO mapreduce.JobSubmitter: number of splits:3
14/07/06 15:24:50 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/07/06 15:24:50 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/07/06 15:24:50 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/07/06 15:24:50 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/07/06 15:24:50 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
14/07/06 15:24:50 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/07/06 15:24:50 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/07/06 15:24:50 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
14/07/06 15:24:50 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/07/06 15:24:50 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/07/06 15:24:51 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1851811203_0001
14/07/06 15:24:51 WARN conf.Configuration: file:/home/priya/hdfs-tmp/mapred/staging/priya1851811203/.staging/job_local1851811203_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
14/07/06 15:24:51 WARN conf.Configuration: file:/home/priya/hdfs-tmp/mapred/staging/priya1851811203/.staging/job_local1851811203_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
14/07/06 15:24:52 WARN conf.Configuration: file:/home/priya/hdfs-tmp/mapred/local/localRunner/priya/job_local1851811203_0001/job_local1851811203_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
14/07/06 15:24:52 WARN conf.Configuration: file:/home/priya/hdfs-tmp/mapred/local/localRunner/priya/job_local1851811203_0001/job_local1851811203_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
14/07/06 15:24:52 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
14/07/06 15:24:52 INFO mapreduce.Job: Running job: job_local1851811203_0001
14/07/06 15:24:52 INFO mapred.LocalJobRunner: OutputCommitter set in config null
14/07/06 15:24:52 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
14/07/06 15:24:53 INFO mapred.LocalJobRunner: Waiting for map tasks
14/07/06 15:24:53 INFO mapred.LocalJobRunner: Starting task: attempt_local1851811203_0001_m_000000_0
14/07/06 15:24:53 INFO mapreduce.Job: Job job_local1851811203_0001 running in uber mode : false
14/07/06 15:24:53 INFO mapreduce.Job: map 0% reduce 0%
14/07/06 15:24:53 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
14/07/06 15:24:53 INFO mapred.MapTask: Processing split: hdfs://localhost/input/2.txt:0+15
14/07/06 15:24:53 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
14/07/06 15:24:53 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
14/07/06 15:24:53 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
14/07/06 15:24:53 INFO mapred.MapTask: soft limit at 83886080
14/07/06 15:24:53 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
14/07/06 15:24:53 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
14/07/06 15:24:54 INFO mapred.LocalJobRunner:
14/07/06 15:24:54 INFO mapred.MapTask: Starting flush of map output
14/07/06 15:24:54 INFO mapred.MapTask: Spilling map output
14/07/06 15:24:54 INFO mapred.MapTask: bufstart = 0; bufend = 79; bufvoid = 104857600
14/07/06 15:24:54 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214368(104857472); length = 29/6553600
14/07/06 15:24:54 INFO mapred.MapTask: Finished spill 0
14/07/06 15:24:54 INFO mapred.Task: Task:attempt_local1851811203_0001_m_000000_0 is done. And is in the process of committing
14/07/06 15:24:54 INFO mapred.LocalJobRunner: map
14/07/06 15:24:54 INFO mapred.Task: Task 'attempt_local1851811203_0001_m_000000_0' done.
14/07/06 15:24:54 INFO mapred.LocalJobRunner: Finishing task: attempt_local1851811203_0001_m_000000_0
14/07/06 15:24:54 INFO mapred.LocalJobRunner: Starting task: attempt_local1851811203_0001_m_000001_0
14/07/06 15:24:54 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
14/07/06 15:24:54 INFO mapred.MapTask: Processing split: hdfs://localhost/input/1.txt:0+10
14/07/06 15:24:54 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
14/07/06 15:24:54 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
14/07/06 15:24:54 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
14/07/06 15:24:54 INFO mapred.MapTask: soft limit at 83886080
14/07/06 15:24:54 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
14/07/06 15:24:54 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
14/07/06 15:24:54 INFO mapreduce.Job: map 100% reduce 0%
14/07/06 15:24:54 INFO mapred.LocalJobRunner:
14/07/06 15:24:54 INFO mapred.MapTask: Starting flush of map output
14/07/06 15:24:54 INFO mapred.MapTask: Spilling map output
14/07/06 15:24:54 INFO mapred.MapTask: bufstart = 0; bufend = 50; bufvoid = 104857600
14/07/06 15:24:54 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214380(104857520); length = 17/6553600
14/07/06 15:24:54 INFO mapred.MapTask: Finished spill 0
14/07/06 15:24:54 INFO mapred.Task: Task:attempt_local1851811203_0001_m_000001_0 is done. And is in the process of committing
14/07/06 15:24:54 INFO mapred.LocalJobRunner: map
14/07/06 15:24:54 INFO mapred.Task: Task 'attempt_local1851811203_0001_m_000001_0' done.
14/07/06 15:24:54 INFO mapred.LocalJobRunner: Finishing task: attempt_local1851811203_0001_m_000001_0
14/07/06 15:24:54 INFO mapred.LocalJobRunner: Starting task: attempt_local1851811203_0001_m_000002_0
14/07/06 15:24:54 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
14/07/06 15:24:54 INFO mapred.MapTask: Processing split: hdfs://localhost/input/3.txt:0+10
14/07/06 15:24:54 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
14/07/06 15:24:54 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
14/07/06 15:24:54 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
14/07/06 15:24:54 INFO mapred.MapTask: soft limit at 83886080
14/07/06 15:24:54 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
14/07/06 15:24:54 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
14/07/06 15:24:55 INFO mapred.LocalJobRunner:
14/07/06 15:24:55 INFO mapred.MapTask: Starting flush of map output
14/07/06 15:24:55 INFO mapred.MapTask: Spilling map output
14/07/06 15:24:55 INFO mapred.MapTask: bufstart = 0; bufend = 50; bufvoid = 104857600
14/07/06 15:24:55 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214380(104857520); length = 17/6553600
14/07/06 15:24:55 INFO mapred.MapTask: Finished spill 0
14/07/06 15:24:55 INFO mapred.Task: Task:attempt_local1851811203_0001_m_000002_0 is done. And is in the process of committing
14/07/06 15:24:55 INFO mapred.LocalJobRunner: map
14/07/06 15:24:55 INFO mapred.Task: Task 'attempt_local1851811203_0001_m_000002_0' done.
14/07/06 15:24:55 INFO mapred.LocalJobRunner: Finishing task: attempt_local1851811203_0001_m_000002_0
14/07/06 15:24:55 INFO mapred.LocalJobRunner: Map task executor complete.
14/07/06 15:24:55 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
14/07/06 15:24:55 INFO mapred.Merger: Merging 3 sorted segments
14/07/06 15:24:55 INFO mapred.Merger: Down to the last merge-pass, with 3 segments left of total size: 191 bytes
14/07/06 15:24:55 INFO mapred.LocalJobRunner:
14/07/06 15:24:55 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
14/07/06 15:24:55 INFO mapred.Task: Task:attempt_local1851811203_0001_r_000000_0 is done. And is in the process of committing
14/07/06 15:24:55 INFO mapred.LocalJobRunner:
14/07/06 15:24:55 INFO mapred.Task: Task attempt_local1851811203_0001_r_000000_0 is allowed to commit now
14/07/06 15:24:55 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1851811203_0001_r_000000_0' to hdfs://localhost/totalOrderOutput/_temporary/0/task_local1851811203_0001_r_000000
14/07/06 15:24:55 INFO mapred.LocalJobRunner: reduce > reduce
14/07/06 15:24:55 INFO mapred.Task: Task **'attempt_local1851811203_0001_r_000000_0'** done.
14/07/06 15:24:56 INFO mapreduce.Job: map 100% reduce 100%
14/07/06 15:24:56 INFO mapreduce.Job: Job job_local1851811203_0001 completed successfully
14/07/06 15:24:56 INFO mapreduce.Job: Counters: 32
File System Counters
FILE: Number of bytes read=21871
FILE: Number of bytes written=768178
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=110
HDFS: Number of bytes written=74
HDFS: Number of read operations=37
HDFS: Number of large read operations=0
HDFS: Number of write operations=6
Map-Reduce Framework
Map input records=18
Map output records=18
Map output bytes=179
Map output materialized bytes=233
Input split bytes=279
Combine input records=0
Combine output records=0
Reduce input groups=8
Reduce shuffle bytes=0
Reduce input records=18
Reduce output records=18
Spilled Records=36
Shuffled Maps =0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=54
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=1372061696
File Input Format Counters
Bytes Read=35
File Output Format Counters
Bytes Written=74
Retun value is 0
即使我将减速器的数量设置为2,仍然只创建一个减速器。
答案 0 :(得分:2)
原因是您以本地模式运行。
您可以拥有look at the source code of the LocalJobRunner
:
int numReduceTasks = job.getNumReduceTasks();
if (numReduceTasks > 1 || numReduceTasks < 0) {
// we only allow 0 or 1 reducer in local mode
numReduceTasks = 1;
job.setNumReduceTasks(1);
}
要更改为伪分布式模式,您需要配置:
mapreduce.framework.name = yarn
您目前将此设置为local
。