Question

我的地图功能必须为每个输入读取一个文件。该文件根本没有变化，仅供阅读。分布式缓存可能对我有很多帮助，但我无法找到使用它的方法。我需要覆盖的public void configure（JobConf conf）函数，我认为已弃用。好的JobConf肯定已被弃用。所有DistributedCache教程都使用不推荐的方式。我能做什么？有没有我可以覆盖的另一个配置功能？

这是我的地图功能的第一行：

     Configuration conf = new Configuration();          //load the MFile
     FileSystem fs = FileSystem.get(conf);
     Path inFile = new Path("planet/MFile");       
     FSDataInputStream in = fs.open(inFile);
     DecisionTree dtree=new DecisionTree().loadTree(in);

我想缓存那个MFile，以便我的地图功能不需要反复查看

Answer 1

我想，我做到了。我跟着Ravi Bhatt提示，我写了这个：

  @Override
  protected void setup(Context context) throws IOException, InterruptedException
  {      
      FileSystem fs = FileSystem.get(context.getConfiguration());
      URI files[]=DistributedCache.getCacheFiles(context.getConfiguration());
      Path path = new Path(files[0].toString());
      in = fs.open(path);
      dtree=new DecisionTree().loadTree(in);                 
  }

在我的main方法中，我这样做，将其添加到缓存中：

  DistributedCache.addCacheFile(new URI(args[0]+"/"+"MFile"), conf);
  Job job = new Job(conf, "MR phase one");

我能够以这种方式检索我需要的文件，但是无法判断它是否100％正常工作。有没有办法测试它？感谢。

Answer 2

Jobconf已在0.20. x中弃用，但在1.0.0中则不是！ :-)（截至撰写本文时）

对于您的问题，有两种方法可以在java中运行map reduce作业，一种是在extending包中使用（org.apache.hadoop.mapreduce）类，另一种是在implementing类中使用org.apache.hadoop.mapred类。 {1}}包（或反过来）。

不确定您使用的是哪一个，如果您没有要覆盖的configure方法，您将获得一个setup方法来覆盖。

@Override
protected void setup(Context context) throws IOException, InterruptedException

这与configure类似，应该对您有帮助。

setup override个extend Mapper class时，您会org.apache.hadoop.mapreduce方法{{1}}。

所有映射任务的Hadoop缓存文件

2 个答案: