我的reducer函数中有一个context.write(...)方法,但是它什么也没写。奇怪的是,上面的System.out.println(...)可以正常工作并打印所需的结果(如您在以下屏幕上看到的那样):
Image of the System.out.println trace
这是完整的代码:
public class Jointure {
public static class TokenizerMapper extends Mapper<Object, Text, IntWritable, Text> {
private boolean tab2 = false; // true quand iteration sur les lignes arrive au tab2
public void map(Object key, org.apache.hadoop.io.Text value, Context context)
throws IOException, InterruptedException {
Arrays.stream(value.toString().split("\\r?\\n")).forEach(line -> { // iterer sur chaque ligne du fichier
// input
if ((!tab2) && (!line.equals(""))) { // si ligne dans tab1
String[] parts = line.split(";");
int idtoWrite = Integer.parseInt(parts[0]);
String valueToWrite = parts[1] + ";Table1";
try {
context.write(new IntWritable(idtoWrite), new Text(valueToWrite)); // creer un couple cle/valeur
// en output
} catch (Exception e) {
}
} else if (line.equals("")) { // si séparation des deux tabs
tab2 = true;
} else if (tab2 && (!line.equals(""))) { // si ligne dans tab2
String[] parts = line.split(";");
int idtoWrite = Integer.parseInt(parts[0]);
String valueToWrite = parts[1] + ";Table2";
try {
context.write(new IntWritable(idtoWrite), new Text(valueToWrite)); // creer un couple cle/valeur
// en output
} catch (Exception e) {
}
}
});
}
}
public static class IntSumReducer extends Reducer<IntWritable, Text, IntWritable, Text> {
public void reduce(IntWritable key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
ArrayList<String> listPrenom = new ArrayList<String>();
ArrayList<String> listPays = new ArrayList<String>();
for (Text val : values) {
String[] parts = val.toString().split(";");
String nomOuPays = parts[0];
String table = "";
try {
table = parts[1];
} catch (Exception e) {
}
if (table.equals("Table1")) {
listPrenom.add(nomOuPays);
} else if (table.equals("Table2")) {
listPays.add(nomOuPays);
}
}
for (int i = 0; i < listPrenom.size(); i++) {
for (int j = 0; j < listPays.size(); j++) {
String toWrite = listPrenom.get(i) + " " + listPays.get(j);
System.out.println("=====================WRITE=======================");
System.out.println(toWrite);
context.write(key, new Text(toWrite));
}
}
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "jointure");
job.setJarByClass(Jointure.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
你有什么主意吗?感谢您的宝贵时间。
编辑:
这是我启动程序时完整的日志跟踪:
2019-03-14 20:05:03,049 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2019-03-14 20:05:03,116 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2019-03-14 20:05:03,116 INFO impl.MetricsSystemImpl: JobTracker metrics system started
2019-03-14 20:05:03,475 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2019-03-14 20:05:03,542 INFO input.FileInputFormat: Total input files to process : 1
2019-03-14 20:05:03,564 INFO mapreduce.JobSubmitter: number of splits:1
2019-03-14 20:05:03,674 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1184033728_0001
2019-03-14 20:05:03,675 INFO mapreduce.JobSubmitter: Executing with tokens: []
2019-03-14 20:05:03,803 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
2019-03-14 20:05:03,803 INFO mapreduce.Job: Running job: job_local1184033728_0001
2019-03-14 20:05:03,804 INFO mapred.LocalJobRunner: OutputCommitter set in config null
2019-03-14 20:05:03,808 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2019-03-14 20:05:03,808 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2019-03-14 20:05:03,809 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2019-03-14 20:05:03,845 INFO mapred.LocalJobRunner: Starting task: attempt_local1184033728_0001_m_000000_0
2019-03-14 20:05:03,848 INFO mapred.LocalJobRunner: Waiting for map tasks
2019-03-14 20:05:03,867 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2019-03-14 20:05:03,867 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2019-03-14 20:05:03,918 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2019-03-14 20:05:03,934 INFO mapred.MapTask: Processing split: file:/media/mathis/OS/Cours/Semestre4/Cloud-Internet-objet/Hadoop-MapReduce/inputTab/file-tab:0+56
2019-03-14 20:05:04,046 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
2019-03-14 20:05:04,046 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
2019-03-14 20:05:04,046 INFO mapred.MapTask: soft limit at 83886080
2019-03-14 20:05:04,046 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
2019-03-14 20:05:04,046 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
2019-03-14 20:05:04,049 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2019-03-14 20:05:04,059 INFO mapred.LocalJobRunner:
2019-03-14 20:05:04,059 INFO mapred.MapTask: Starting flush of map output
2019-03-14 20:05:04,059 INFO mapred.MapTask: Spilling map output
2019-03-14 20:05:04,059 INFO mapred.MapTask: bufstart = 0; bufend = 110; bufvoid = 104857600
2019-03-14 20:05:04,059 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214376(104857504); length = 21/6553600
=====================WRITE=======================
Pierre Allemagne
=====================WRITE=======================
Pierre France
=====================WRITE=======================
Jacques France
2019-03-14 20:05:04,184 INFO mapred.MapTask: Finished spill 0
2019-03-14 20:05:04,234 INFO mapred.Task: Task:attempt_local1184033728_0001_m_000000_0 is done. And is in the process of committing
2019-03-14 20:05:04,237 INFO mapred.LocalJobRunner: map
2019-03-14 20:05:04,238 INFO mapred.Task: Task 'attempt_local1184033728_0001_m_000000_0' done.
2019-03-14 20:05:04,250 INFO mapred.Task: Final Counters for attempt_local1184033728_0001_m_000000_0: Counters: 18
File System Counters
FILE: Number of bytes read=4319
FILE: Number of bytes written=502994
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=7
Map output records=6
Map output bytes=110
Map output materialized bytes=70
Input split bytes=158
Combine input records=6
Combine output records=3
Spilled Records=3
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=0
Total committed heap usage (bytes)=212860928
File Input Format Counters
Bytes Read=56
2019-03-14 20:05:04,251 INFO mapred.LocalJobRunner: Finishing task: attempt_local1184033728_0001_m_000000_0
2019-03-14 20:05:04,252 INFO mapred.LocalJobRunner: map task executor complete.
2019-03-14 20:05:04,256 INFO mapred.LocalJobRunner: Waiting for reduce tasks
2019-03-14 20:05:04,256 INFO mapred.LocalJobRunner: Starting task: attempt_local1184033728_0001_r_000000_0
2019-03-14 20:05:04,269 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2019-03-14 20:05:04,269 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2019-03-14 20:05:04,270 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2019-03-14 20:05:04,274 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@721f3077
2019-03-14 20:05:04,276 WARN impl.MetricsSystemImpl: JobTracker metrics system already initialized!
2019-03-14 20:05:04,300 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=625370688, maxSingleShuffleLimit=156342672, mergeThreshold=412744672, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2019-03-14 20:05:04,301 INFO reduce.EventFetcher: attempt_local1184033728_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2019-03-14 20:05:04,321 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1184033728_0001_m_000000_0 decomp: 66 len: 70 to MEMORY
2019-03-14 20:05:04,325 INFO reduce.InMemoryMapOutput: Read 66 bytes from map-output for attempt_local1184033728_0001_m_000000_0
2019-03-14 20:05:04,326 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 66, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->66
2019-03-14 20:05:04,327 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
2019-03-14 20:05:04,327 INFO mapred.LocalJobRunner: 1 / 1 copied.
2019-03-14 20:05:04,327 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
2019-03-14 20:05:04,433 INFO mapred.Merger: Merging 1 sorted segments
2019-03-14 20:05:04,433 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 60 bytes
2019-03-14 20:05:04,436 INFO reduce.MergeManagerImpl: Merged 1 segments, 66 bytes to disk to satisfy reduce memory limit
2019-03-14 20:05:04,438 INFO reduce.MergeManagerImpl: Merging 1 files, 70 bytes from disk
2019-03-14 20:05:04,440 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
2019-03-14 20:05:04,440 INFO mapred.Merger: Merging 1 sorted segments
2019-03-14 20:05:04,443 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 60 bytes
2019-03-14 20:05:04,445 INFO mapred.LocalJobRunner: 1 / 1 copied.
2019-03-14 20:05:04,493 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2019-03-14 20:05:04,498 INFO mapred.Task: Task:attempt_local1184033728_0001_r_000000_0 is done. And is in the process of committing
2019-03-14 20:05:04,504 INFO mapred.LocalJobRunner: 1 / 1 copied.
2019-03-14 20:05:04,505 INFO mapred.Task: Task attempt_local1184033728_0001_r_000000_0 is allowed to commit now
2019-03-14 20:05:04,541 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1184033728_0001_r_000000_0' to file:/media/mathis/OS/Cours/Semestre4/Cloud-Internet-objet/Hadoop-MapReduce/output
2019-03-14 20:05:04,542 INFO mapred.LocalJobRunner: reduce > reduce
2019-03-14 20:05:04,542 INFO mapred.Task: Task 'attempt_local1184033728_0001_r_000000_0' done.
2019-03-14 20:05:04,544 INFO mapred.Task: Final Counters for attempt_local1184033728_0001_r_000000_0: Counters: 24
File System Counters
FILE: Number of bytes read=4491
FILE: Number of bytes written=503072
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=70
Reduce input records=3
Reduce output records=0
Spilled Records=3
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=6
Total committed heap usage (bytes)=212860928
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Output Format Counters
Bytes Written=8
2019-03-14 20:05:04,544 INFO mapred.LocalJobRunner: Finishing task: attempt_local1184033728_0001_r_000000_0
2019-03-14 20:05:04,544 INFO mapred.LocalJobRunner: reduce task executor complete.
2019-03-14 20:05:04,807 INFO mapreduce.Job: Job job_local1184033728_0001 running in uber mode : false
2019-03-14 20:05:04,811 INFO mapreduce.Job: map 100% reduce 100%
2019-03-14 20:05:04,816 INFO mapreduce.Job: Job job_local1184033728_0001 completed successfully
2019-03-14 20:05:04,846 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=8810
FILE: Number of bytes written=1006066
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=7
Map output records=6
Map output bytes=110
Map output materialized bytes=70
Input split bytes=158
Combine input records=6
Combine output records=3
Reduce input groups=2
Reduce shuffle bytes=70
Reduce input records=3
Reduce output records=0
Spilled Records=6
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=6
Total committed heap usage (bytes)=425721856
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=56
File Output Format Counters
Bytes Written=8