Question

几个小时后我尝试将外部JAR放入我的代码......但我不幸运。所以也许有人可以帮助我。无论如何我使用Hadoop 2.5。

我尝试使用这个外部JAR：

public class SampleAddition {
  private int firstVariable;
  private int secondVariable;

  public SampleAddition(int firstVariable, int secondVariable) {
      this.firstVariable = firstVariable;
      this.secondVariable = secondVariable;
  }

  public int getResult(){
      int result = firstVariable + secondVariable;
      return result;
  }
}

对于MapReduce-Code，我使用了简单的WordCount示例：

import java.io.IOException;
import java.net.URI;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class SampleAdditionMapRed {

 // Main-Method
 public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "SampleAddition MapReduce");

    // Set Classes
    job.setJarByClass(SampleAdditionMapRed.class);
    job.setMapperClass(MyMapper.class);
    job.setReducerClass(MyReducer.class);

    // Set Number of Reducer
    job.setNumReduceTasks(1);

    // Set Output and Input Parameters
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    // Set FileDestination
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    // Set external Jar
    // Path pfad = new Path("/ClassFiles/SampleAddition.jar");
    // job.addCacheFile(pfad.toUri());

    // Run Job
    System.exit(job.waitForCompletion(true) ? 0 : 1);
}


// Mapper
public static class MyMapper extends
        Mapper<Object, Text, Text, IntWritable> {

    // Initialize Variables
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    // Declare Map-Methode
    public void map(Object key, Text value, Context context)
            throws IOException, InterruptedException {
        StringTokenizer itr = new StringTokenizer(value.toString());
        while (itr.hasMoreTokens()) {
            word.set(itr.nextToken());
            context.write(word, one);
        }
    }
}

// Reducer
public static class MyReducer extends
        Reducer<Text, IntWritable, Text, IntWritable> {

    private IntWritable result = new IntWritable();

    // Declare Reduce-Method
    public void reduce(Text key, Iterable<IntWritable> values,
            Context context) throws IOException, InterruptedException {

        // Set SampleAddition Class
        int value1 = 55;
        int value2 = 100;

        // Sum two Values with Class SampleAddition
        SampleAddition test = new SampleAddition(value1, value2);

        // Return summarized values
        int resultFromClass = 0;
        resultFromClass = test.getResult();

        // Output
        result.set(resultFromClass);
        context.write(key, result);
    }
  }
}

在第一次尝试中，我将外部Jar放在我的singleNodeCluster中，位于以下目录“/ usr / lib / hadoop /”中。这很有效。但是对于一个大集群来说，这不是一个选择。

然后我尝试使用函数job.addCacheFile（...） - 以下两行：

// Path pfad = new Path("/ClassFiles/SampleAddition.jar");
// job.addCacheFile(pfad.toUri());

但是现在当我尝试编译它时，我得到以下错误：

/root/MapReduce/SampleAdditionMapRed.java:40: error: cannot find
symbol
                job.addCacheFile(pfad.toUri());
                   ^   symbol:   method addCacheFile(URI)   location: variable job of type Job 1 error

我在互联网上找到的大多数解决方案都是使用Hadoop 1.x. 我真的很感激任何想法！

添加编译命令：

javac -d CompileBin -classpath "/usr/lib/hadoop/*:/usr/lib/hadoop/client-0.20/*:/root/MapReduce/ClassFiles/SampleAddition.jar" /root/MapReduce/SampleAdditionMapRed.java    
jar cvf SampleAdditionMapRed.jar -C CompileBin .

Answer 1

在Hadoop Gen 2中，你可以像

那样做

DistributedCache.addCacheFile(..);

例如，请参阅here。

Answer 2

我认为语法中没有问题。只需交叉检查包导入。

工作

org.apache.hadoop.mapreduce.Job

和路径

org.apache.hadoop.fs.Path

使用.addCacheFile的Hadoop Mapreduce CompileError（使用外部Jar）

2 个答案: