HADOOP REDUCER JAVA-context.write不写任何东西

时间:2019-03-14 14:48:30

标签: java hadoop reducers

我的reducer函数中有一个context.write(...)方法,但是它什么也没写。奇怪的是,上面的System.out.println(...)可以正常工作并打印所需的结果(如您在以下屏幕上看到的那样):

Image of the System.out.println trace

这是完整的代码:

public class Jointure {

    public static class TokenizerMapper extends Mapper<Object, Text, IntWritable, Text> {
        private boolean tab2 = false; // true quand iteration sur les lignes arrive au tab2

        public void map(Object key, org.apache.hadoop.io.Text value, Context context)
                throws IOException, InterruptedException {
            Arrays.stream(value.toString().split("\\r?\\n")).forEach(line -> { // iterer sur chaque ligne du fichier
                                                                            // input
                if ((!tab2) && (!line.equals(""))) { // si ligne dans tab1
                    String[] parts = line.split(";");
                    int idtoWrite = Integer.parseInt(parts[0]);
                    String valueToWrite = parts[1] + ";Table1";
                    try {
                        context.write(new IntWritable(idtoWrite), new Text(valueToWrite)); // creer un couple cle/valeur
                                                                                        // en output
                    } catch (Exception e) {
                    }
                } else if (line.equals("")) { // si séparation des deux tabs
                    tab2 = true;
                } else if (tab2 && (!line.equals(""))) { // si ligne dans tab2
                    String[] parts = line.split(";");
                    int idtoWrite = Integer.parseInt(parts[0]);
                    String valueToWrite = parts[1] + ";Table2";
                    try {
                        context.write(new IntWritable(idtoWrite), new Text(valueToWrite)); // creer un couple cle/valeur
                                                                                        // en output
                    } catch (Exception e) {
                    }
                }
            });
        }
    }

    public static class IntSumReducer extends Reducer<IntWritable, Text, IntWritable, Text> {
        public void reduce(IntWritable key, Iterable<Text> values, Context context)
                throws IOException, InterruptedException {

            ArrayList<String> listPrenom = new ArrayList<String>();
            ArrayList<String> listPays = new ArrayList<String>();
            for (Text val : values) {
                String[] parts = val.toString().split(";");
                String nomOuPays = parts[0];
                String table = "";
                try {
                    table = parts[1];
                } catch (Exception e) {

                }

                if (table.equals("Table1")) {
                    listPrenom.add(nomOuPays);
                } else if (table.equals("Table2")) {
                    listPays.add(nomOuPays);
                }
            }

            for (int i = 0; i < listPrenom.size(); i++) {
                for (int j = 0; j < listPays.size(); j++) {
                    String toWrite = listPrenom.get(i) + " " + listPays.get(j);
                    System.out.println("=====================WRITE=======================");
                    System.out.println(toWrite);
                    context.write(key, new Text(toWrite));
                }
            }
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "jointure");
        job.setJarByClass(Jointure.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(IntWritable.class);
        job.setOutputValueClass(Text.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

你有什么主意吗?感谢您的宝贵时间。

编辑:

这是我启动程序时完整的日志跟踪:

2019-03-14 20:05:03,049 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2019-03-14 20:05:03,116 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2019-03-14 20:05:03,116 INFO impl.MetricsSystemImpl: JobTracker metrics system started
2019-03-14 20:05:03,475 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2019-03-14 20:05:03,542 INFO input.FileInputFormat: Total input files to process : 1
2019-03-14 20:05:03,564 INFO mapreduce.JobSubmitter: number of splits:1
2019-03-14 20:05:03,674 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1184033728_0001
2019-03-14 20:05:03,675 INFO mapreduce.JobSubmitter: Executing with tokens: []
2019-03-14 20:05:03,803 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
2019-03-14 20:05:03,803 INFO mapreduce.Job: Running job: job_local1184033728_0001
2019-03-14 20:05:03,804 INFO mapred.LocalJobRunner: OutputCommitter set in config null
2019-03-14 20:05:03,808 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2019-03-14 20:05:03,808 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2019-03-14 20:05:03,809 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2019-03-14 20:05:03,845 INFO mapred.LocalJobRunner: Starting task: attempt_local1184033728_0001_m_000000_0
2019-03-14 20:05:03,848 INFO mapred.LocalJobRunner: Waiting for map tasks
2019-03-14 20:05:03,867 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2019-03-14 20:05:03,867 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2019-03-14 20:05:03,918 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2019-03-14 20:05:03,934 INFO mapred.MapTask: Processing split: file:/media/mathis/OS/Cours/Semestre4/Cloud-Internet-objet/Hadoop-MapReduce/inputTab/file-tab:0+56
2019-03-14 20:05:04,046 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
2019-03-14 20:05:04,046 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
2019-03-14 20:05:04,046 INFO mapred.MapTask: soft limit at 83886080
2019-03-14 20:05:04,046 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
2019-03-14 20:05:04,046 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
2019-03-14 20:05:04,049 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2019-03-14 20:05:04,059 INFO mapred.LocalJobRunner: 
2019-03-14 20:05:04,059 INFO mapred.MapTask: Starting flush of map output
2019-03-14 20:05:04,059 INFO mapred.MapTask: Spilling map output
2019-03-14 20:05:04,059 INFO mapred.MapTask: bufstart = 0; bufend = 110; bufvoid = 104857600
2019-03-14 20:05:04,059 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214376(104857504); length = 21/6553600
=====================WRITE=======================
Pierre Allemagne
=====================WRITE=======================
Pierre France
=====================WRITE=======================
Jacques France
2019-03-14 20:05:04,184 INFO mapred.MapTask: Finished spill 0
2019-03-14 20:05:04,234 INFO mapred.Task: Task:attempt_local1184033728_0001_m_000000_0 is done. And is in the process of committing
2019-03-14 20:05:04,237 INFO mapred.LocalJobRunner: map
2019-03-14 20:05:04,238 INFO mapred.Task: Task 'attempt_local1184033728_0001_m_000000_0' done.
2019-03-14 20:05:04,250 INFO mapred.Task: Final Counters for attempt_local1184033728_0001_m_000000_0: Counters: 18
    File System Counters
        FILE: Number of bytes read=4319
        FILE: Number of bytes written=502994
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
    Map-Reduce Framework
        Map input records=7
        Map output records=6
        Map output bytes=110
        Map output materialized bytes=70
        Input split bytes=158
        Combine input records=6
        Combine output records=3
        Spilled Records=3
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=0
        Total committed heap usage (bytes)=212860928
    File Input Format Counters 
        Bytes Read=56
2019-03-14 20:05:04,251 INFO mapred.LocalJobRunner: Finishing task: attempt_local1184033728_0001_m_000000_0
2019-03-14 20:05:04,252 INFO mapred.LocalJobRunner: map task executor complete.
2019-03-14 20:05:04,256 INFO mapred.LocalJobRunner: Waiting for reduce tasks
2019-03-14 20:05:04,256 INFO mapred.LocalJobRunner: Starting task: attempt_local1184033728_0001_r_000000_0
2019-03-14 20:05:04,269 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2019-03-14 20:05:04,269 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2019-03-14 20:05:04,270 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2019-03-14 20:05:04,274 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@721f3077
2019-03-14 20:05:04,276 WARN impl.MetricsSystemImpl: JobTracker metrics system already initialized!
2019-03-14 20:05:04,300 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=625370688, maxSingleShuffleLimit=156342672, mergeThreshold=412744672, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2019-03-14 20:05:04,301 INFO reduce.EventFetcher: attempt_local1184033728_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2019-03-14 20:05:04,321 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1184033728_0001_m_000000_0 decomp: 66 len: 70 to MEMORY
2019-03-14 20:05:04,325 INFO reduce.InMemoryMapOutput: Read 66 bytes from map-output for attempt_local1184033728_0001_m_000000_0
2019-03-14 20:05:04,326 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 66, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->66
2019-03-14 20:05:04,327 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
2019-03-14 20:05:04,327 INFO mapred.LocalJobRunner: 1 / 1 copied.
2019-03-14 20:05:04,327 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
2019-03-14 20:05:04,433 INFO mapred.Merger: Merging 1 sorted segments
2019-03-14 20:05:04,433 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 60 bytes
2019-03-14 20:05:04,436 INFO reduce.MergeManagerImpl: Merged 1 segments, 66 bytes to disk to satisfy reduce memory limit
2019-03-14 20:05:04,438 INFO reduce.MergeManagerImpl: Merging 1 files, 70 bytes from disk
2019-03-14 20:05:04,440 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
2019-03-14 20:05:04,440 INFO mapred.Merger: Merging 1 sorted segments
2019-03-14 20:05:04,443 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 60 bytes
2019-03-14 20:05:04,445 INFO mapred.LocalJobRunner: 1 / 1 copied.
2019-03-14 20:05:04,493 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2019-03-14 20:05:04,498 INFO mapred.Task: Task:attempt_local1184033728_0001_r_000000_0 is done. And is in the process of committing
2019-03-14 20:05:04,504 INFO mapred.LocalJobRunner: 1 / 1 copied.
2019-03-14 20:05:04,505 INFO mapred.Task: Task attempt_local1184033728_0001_r_000000_0 is allowed to commit now
2019-03-14 20:05:04,541 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1184033728_0001_r_000000_0' to file:/media/mathis/OS/Cours/Semestre4/Cloud-Internet-objet/Hadoop-MapReduce/output
2019-03-14 20:05:04,542 INFO mapred.LocalJobRunner: reduce > reduce
2019-03-14 20:05:04,542 INFO mapred.Task: Task 'attempt_local1184033728_0001_r_000000_0' done.
2019-03-14 20:05:04,544 INFO mapred.Task: Final Counters for attempt_local1184033728_0001_r_000000_0: Counters: 24
    File System Counters
        FILE: Number of bytes read=4491
        FILE: Number of bytes written=503072
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
    Map-Reduce Framework
        Combine input records=0
        Combine output records=0
        Reduce input groups=2
        Reduce shuffle bytes=70
        Reduce input records=3
        Reduce output records=0
        Spilled Records=3
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=6
        Total committed heap usage (bytes)=212860928
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Output Format Counters 
        Bytes Written=8
2019-03-14 20:05:04,544 INFO mapred.LocalJobRunner: Finishing task: attempt_local1184033728_0001_r_000000_0
2019-03-14 20:05:04,544 INFO mapred.LocalJobRunner: reduce task executor complete.
2019-03-14 20:05:04,807 INFO mapreduce.Job: Job job_local1184033728_0001 running in uber mode : false
2019-03-14 20:05:04,811 INFO mapreduce.Job:  map 100% reduce 100%
2019-03-14 20:05:04,816 INFO mapreduce.Job: Job job_local1184033728_0001 completed successfully
2019-03-14 20:05:04,846 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=8810
        FILE: Number of bytes written=1006066
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
    Map-Reduce Framework
        Map input records=7
        Map output records=6
        Map output bytes=110
        Map output materialized bytes=70
        Input split bytes=158
        Combine input records=6
        Combine output records=3
        Reduce input groups=2
        Reduce shuffle bytes=70
        Reduce input records=3
        Reduce output records=0
        Spilled Records=6
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=6
        Total committed heap usage (bytes)=425721856
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=56
    File Output Format Counters 
        Bytes Written=8

0 个答案:

没有答案