几个小时后我尝试将外部JAR放入我的代码......但我不幸运。所以也许有人可以帮助我。无论如何我使用Hadoop 2.5。
我尝试使用这个外部JAR:
public class SampleAddition {
private int firstVariable;
private int secondVariable;
public SampleAddition(int firstVariable, int secondVariable) {
this.firstVariable = firstVariable;
this.secondVariable = secondVariable;
}
public int getResult(){
int result = firstVariable + secondVariable;
return result;
}
}
对于MapReduce-Code,我使用了简单的WordCount示例:
import java.io.IOException;
import java.net.URI;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class SampleAdditionMapRed {
// Main-Method
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "SampleAddition MapReduce");
// Set Classes
job.setJarByClass(SampleAdditionMapRed.class);
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);
// Set Number of Reducer
job.setNumReduceTasks(1);
// Set Output and Input Parameters
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// Set FileDestination
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// Set external Jar
// Path pfad = new Path("/ClassFiles/SampleAddition.jar");
// job.addCacheFile(pfad.toUri());
// Run Job
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
// Mapper
public static class MyMapper extends
Mapper<Object, Text, Text, IntWritable> {
// Initialize Variables
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
// Declare Map-Methode
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
// Reducer
public static class MyReducer extends
Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
// Declare Reduce-Method
public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
// Set SampleAddition Class
int value1 = 55;
int value2 = 100;
// Sum two Values with Class SampleAddition
SampleAddition test = new SampleAddition(value1, value2);
// Return summarized values
int resultFromClass = 0;
resultFromClass = test.getResult();
// Output
result.set(resultFromClass);
context.write(key, result);
}
}
}
在第一次尝试中,我将外部Jar放在我的singleNodeCluster中,位于以下目录“/ usr / lib / hadoop /”中。这很有效。但是对于一个大集群来说,这不是一个选择。
然后我尝试使用函数job.addCacheFile(...) - 以下两行:
// Path pfad = new Path("/ClassFiles/SampleAddition.jar");
// job.addCacheFile(pfad.toUri());
但是现在当我尝试编译它时,我得到以下错误:
/root/MapReduce/SampleAdditionMapRed.java:40: error: cannot find
symbol
job.addCacheFile(pfad.toUri());
^ symbol: method addCacheFile(URI) location: variable job of type Job 1 error
我在互联网上找到的大多数解决方案都是使用Hadoop 1.x. 我真的很感激任何想法!
添加编译命令:
javac -d CompileBin -classpath "/usr/lib/hadoop/*:/usr/lib/hadoop/client-0.20/*:/root/MapReduce/ClassFiles/SampleAddition.jar" /root/MapReduce/SampleAdditionMapRed.java
jar cvf SampleAdditionMapRed.jar -C CompileBin .
答案 0 :(得分:0)
答案 1 :(得分:0)
我认为语法中没有问题。 只需交叉检查包导入。
工作
org.apache.hadoop.mapreduce.Job
和路径
org.apache.hadoop.fs.Path