我能够无错误地编译驱动程序,Mapper和reducer程序。我创建了jar文件,甚至检查了输入数据集。一切都很好看。以下是驱动程序,映射器和reducer。有人可以看看我在做什么愚蠢的错误。我创建了5个mapreduce java程序,它们都运行良好。我上传了github。
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
//This driver program will bring all the information needed to submit this Map reduce job.
public class DictionaryDrv {
public static void main(String[] args) throws Exception{
if (args.length !=2){
System.err.println("Usage: MultiLangDictionary <input path> <output path>");
System.exit(-1);
}
//To sumbit a mapreduce job we need the following information.
//a. Input location where the input dataset is
//b. Output location where the mapreduce job should output the results to
//c. Name of the Mapper class that should be executed
//d. Name of the reducer class that should be executed (if reducer is needed because sometime we may not need a reducer since we may not need to aggregate the output of Mappers)
//reads the default configuration of cluster from the configuration xml files
// https://www.quora.com/What-is-the-use-of-a-configuration-class-and-object-in-Hadoop-MapReduce-code
Configuration conf = new Configuration();
//Initializing the job with the default configuration of the cluster
//When we submit a mapreduce job, it will be distributed across all the nodes in the cluster. So we need give the job a name so that hadoop can find the job to run
Job ajob = new Job(conf, "MultiLangDictionary");
//Assigning the driver class name
ajob.setJarByClass(DictionaryDrv.class);
//first argument is job itself
//second argument is location of the input dataset
FileInputFormat.addInputPath(ajob, new Path(args[0]));
//first argument is the job itself
//second argument is the location of the output dataset
FileOutputFormat.setOutputPath(ajob, new Path(args[1]));
//Defining input Format class which is responsible to parse the dataset into a key value pair
//Configuring the input/output path from the filesystem into the job
// InputFormat is responsible for 3 main tasks.
// a. Validate inputs - meaning the dataset exists in the location specified.
// b. Split up the input files into logical input splits. Each input split will be assigned to an individual mapper.
// c. Recordreader implementation to extract logical records to process by the mapper
ajob.setInputFormatClass(TextInputFormat.class);
//Defining output Format class which is responsible to parse the final key-value output from MR framework to a text file into the hard disk
//OutputFomat does 2 mains things
// a. Validate output specifications. Like if the output directory already exists? If the directory exist, it will throw an error.
// b. Recordwriter implementation to write output files of the job
//Hadoop comes with several output format implemenations.
ajob.setOutputFormatClass(TextOutputFormat.class);
//Defining the mapper class name
ajob.setMapperClass(DictionaryMapper.class);
//Defining the Reducer class name
ajob.setReducerClass(DictionaryReducer.class);
//Output types
//Ouput key from the mapper class
ajob.setMapOutputKeyClass(Text.class);
//Output key from the reducer class
ajob.setMapOutputValueClass(Text.class);
//setting the second argument as a path in a path variable
Path outputPath = new Path(args[1]);
//deleting the output path automatically from hdfs so that we don't have delete it explicitly
outputPath.getFileSystem(conf).delete(outputPath);
}
}
答案 0 :(得分:0)
删除最后一行outputPath.getFileSystem(conf).delete(outputPath)并尝试。 程序可能正在运行,但最终因为您正在删除输出目录,所以无法看到任何输出。
答案 1 :(得分:0)
你在哪里提交这份工作?上面的代码缺少类似:
ajob.waitForCompletion(true)
这应该在main
的末尾。请参阅单词计数示例以获取参考:
答案 2 :(得分:0)
我也面临着类似的问题。我不见了
ajob.waitForCompletion(true);
这给我带来了帮助。
一般情况下,我使用以下代码退出程序:
boolean result = ajob.waitForCompletion(true);
System.exit(result?0:1);
如果成功执行了作业,这有助于退出。