Mapreduce程序根本不运行。没有错误消息或没有日志。如何检查发生了什么

时间:2016-06-15 22:58:04

标签: hadoop mapreduce

我能够无错误地编译驱动程序,Mapper和reducer程序。我创建了jar文件,甚至检查了输入数据集。一切都很好看。以下是驱动程序,映射器和reducer。有人可以看看我在做什么愚蠢的错误。我创建了5个mapreduce java程序,它们都运行良好。我上传了github。

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

//This driver program will bring all the information needed to submit this Map reduce job.

public class DictionaryDrv {

public static void main(String[] args) throws Exception{

    if (args.length !=2){

        System.err.println("Usage: MultiLangDictionary <input path> <output path>");

        System.exit(-1);            

    }

    //To sumbit a mapreduce job we need the following information.
        //a. Input location where the input dataset is
        //b. Output location where the mapreduce job should output the results to
        //c. Name of the Mapper class that should be executed
        //d. Name of the reducer class that should be executed (if reducer is needed because sometime we may not need a reducer since we may not need to aggregate the output of Mappers)

    //reads the default configuration of cluster from the configuration xml files
    // https://www.quora.com/What-is-the-use-of-a-configuration-class-and-object-in-Hadoop-MapReduce-code


    Configuration conf = new Configuration();

    //Initializing the job with the default configuration of the cluster
    //When we submit a mapreduce job, it will be distributed across all the nodes in the cluster. So we need give the job a name so that hadoop can find the job to run


    Job ajob = new Job(conf, "MultiLangDictionary");

     //Assigning the driver class name
    ajob.setJarByClass(DictionaryDrv.class);

    //first argument is job itself
    //second argument is location of the input dataset
    FileInputFormat.addInputPath(ajob, new Path(args[0]));

    //first argument is the job itself
    //second argument is the location of the output dataset
    FileOutputFormat.setOutputPath(ajob, new Path(args[1]));

    //Defining input Format class which is responsible to parse the dataset into a key value pair
    //Configuring the input/output path from the filesystem into the job
    // InputFormat is responsible for 3 main tasks.
    //      a. Validate inputs - meaning the dataset exists in the location specified.
    //      b. Split up the input files into logical input splits. Each input split will be assigned to an individual mapper.
    //      c. Recordreader implementation to extract logical records to process by the mapper

    ajob.setInputFormatClass(TextInputFormat.class);

  //Defining output Format class which is responsible to parse the final key-value output from MR framework to a text file into the hard disk
    //OutputFomat does 2 mains things
    //  a. Validate output specifications. Like if the output directory already exists? If the directory exist, it will throw an error.
    //  b. Recordwriter implementation to write output files of the job
    //Hadoop comes with several output format implemenations.

    ajob.setOutputFormatClass(TextOutputFormat.class);


    //Defining the mapper class name
    ajob.setMapperClass(DictionaryMapper.class);

    //Defining the Reducer class name
    ajob.setReducerClass(DictionaryReducer.class);

    //Output types  
    //Ouput key from the mapper class
    ajob.setMapOutputKeyClass(Text.class);

    //Output key from the reducer class
    ajob.setMapOutputValueClass(Text.class);


    //setting the second argument as a path in a path variable
    Path outputPath = new Path(args[1]);

    //deleting the output path automatically from hdfs so that we don't have delete it explicitly
    outputPath.getFileSystem(conf).delete(outputPath);



}

}

3 个答案:

答案 0 :(得分:0)

删除最后一行outputPath.getFileSystem(conf).delete(outputPath)并尝试。 程序可能正在运行,但最终因为您正在删除输出目录,所以无法看到任何输出。

答案 1 :(得分:0)

你在哪里提交这份工作?上面的代码缺少类似:

ajob.waitForCompletion(true)

这应该在main的末尾。请参阅单词计数示例以获取参考:

https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Example:_WordCount_v1.0

答案 2 :(得分:0)

我也面临着类似的问题。我不见了

    ajob.waitForCompletion(true);

这给我带来了帮助。

一般情况下,我使用以下代码退出程序:

    boolean result = ajob.waitForCompletion(true);

    System.exit(result?0:1);

如果成功执行了作业,这有助于退出。