我正在编写一个接收mapper / reducers源代码的程序,动态编译映射器/ reducer并从中生成一个JAR文件。然后它必须在hadoop集群上运行此JAR文件。
最后一部分,我通过我的代码动态设置了所有必需的参数。但是,我现在面临的问题是代码在编译时需要编译的映射器和reducer类。但是在编译时,我没有这些类,并且稍后将在运行时接收它们(例如,通过从远程节点接收的消息)。关于如何通过这个问题我会不会有任何想法/建议?
下面你可以找到我最后一部分的代码,问题在于job.setMapperClass(Mapper_Class.class)和job.setReducerClass(Reducer_Class.class),需要类(Mapper_Class.class和Reducer_Class.class)文件到在编译时出现:
private boolean run_Hadoop_Job(String className){
try{
System.out.println("Starting to run the code on Hadoop...");
String[] argsTemp = { "project_test/input", "project_test/output" };
// create a configuration
Configuration conf = new Configuration();
conf.set("fs.default.name", "hdfs://localhost:54310");
conf.set("mapred.job.tracker", "localhost:54311");
conf.set("mapred.jar", jar_Output_Folder+ java.io.File.separator
+ className+".jar");
conf.set("mapreduce.map.class", "Mapper_Reducer_Classes$Mapper_Class.class");
conf.set("mapreduce.reduce.class", "Mapper_Reducer_Classes$Reducer_Class.class");
// create a new job based on the configuration
Job job = new Job(conf, "Hadoop Example for dynamically and programmatically compiling-running a job");
job.setJarByClass(Platform.class);
//job.setMapperClass(Mapper_Class.class);
//job.setReducerClass(Reducer_Class.class);
// key/value of your reducer output
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(argsTemp[0]));
// this deletes possible output paths to prevent job failures
FileSystem fs = FileSystem.get(conf);
Path out = new Path(argsTemp[1]);
fs.delete(out, true);
// finally set the empty out path
FileOutputFormat.setOutputPath(job, new Path(argsTemp[1]));
//job.submit();
System.exit(job.waitForCompletion(true) ? 0 : 1);
System.out.println("Job Finished!");
} catch (Exception e) { return false; }
return true;
}
修订:所以我修改了代码,使用conf.set(“mapreduce.map.class,”我的mapper.class“)指定mapper和reducers。现在代码编译正确但是当它被执行时它抛出以下内容错误:
ec 24,2012 6:49:43 AM org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息:任务ID:attempt_201212240511_0006_m_000001_2,状态:未通过 java.lang.RuntimeException:java.lang.ClassNotFoundException:Mapper_Reducer_Classes $ Mapper_Class.class 在org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809) at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:157) 在org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569) 在org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) 在org.apache.hadoop.mapred.Child.main(Child.java:170)
答案 0 :(得分:2)
如果在编译时没有它们,那么直接在配置中设置名称,如下所示:
conf.set("mapreduce.map.class", "org.what.ever.ClassName");
conf.set("mapreduce.reduce.class", "org.what.ever.ClassName");
答案 1 :(得分:1)
问题是TaskTracker无法在本地jRE中看到类。
我以这种方式想出来(Maven项目);
首先,将此插件添加到pom.xml,它将构建包含所有依赖项jar的应用程序jar文件,
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<finalName>sample</finalName>
<!--
<finalName>uber-${artifactId}-${version}</finalName>
-->
</configuration>
</plugin>
</plugins>
</build>
在java源代码中,添加这些行,它将包含你的sample.jar,它是通过pom.xml中的tag来为target / sample.jar构建的。
Configuration config = new Configuration();
config.set("fs.default.name", "hdfs://ip:port");
config.set("mapred.job.tracker", "hdfs://ip:port");
JobConf job = new JobConf(config);
job.setJar("target/sample.jar");
通过这种方式,您的tasktrackers可以引用您编写的类,并且不会发生ClassNotFoundException。
答案 2 :(得分:0)
您只需要对将动态创建的类的Class对象的引用。使用Class.for name("foo.Mapper")
代替foo.Mapper.class