Question

我已经把我的胖罐子捆绑了一个文件＆＃34; xxx.txt.gz＆＃34;

我需要在每个Map Task中的每个YARN容器中引用此文件。

所以，如果你看看我的罐子里面：

你会看到的 xxx.txt.gz *

我正试图通过

访问此文件

File mappingFile = new File(getClass().getClassLoader().getResource("xxx.txt.gz").getFile())

但是，在运行时，我从所有任务尝试的日志中收到以下错误

java.io.FileNotFoundException: file:/local/hadoop/1/yarn/local/usercache/USER/appcache/application_1431608807540_0071/filecache/10/job.jar/job.jar!/xxx.txt.gz (No such file or directory)

换句话说，即使我的胖罐有文件，job.jar也没有。

我该如何解决这个问题？

提前多多感谢。

Answer 1

还有另一种从Mappers / Reducers访问文件的方法。希望这个想法在mapreduce中可能是理想的。

您可以使用mapreduce中提供的Distributed Cache选项。通过这种方式，您可以使用hadoop将文件分发到作业的Mappers / Reducers将执行的所有容器。

Answer 2

我实际上意识到在Hadoop 2.7中不推荐使用DistributedCache。但是，对于小实用程序/查找文件，可以将它们添加到HDFS，然后使用常规机制将它们加载到Mapper / Reducer JVMS中。

例如：

public void setup(Context ctx) {
   // gets the job config, therefore, handles the case where the file is located on the local FS or HDFS)
   Configuration jobConf = context.getConfiguration();
   Path filePath = new Path(jobConf.get("my.mapping.file"));
   FileSystem.get(conf).open(filePath);
}

MapReduce从Tasks中的类路径读取文件

2 个答案: