Question

我正在尝试将文件放在分布式缓存中。为了做到这一点，我使用-files选项调用我的驱动程序类，如：

   hadoop jar job.jar my.driver.class -files MYFILE input output

getCacheFiles()和getLocalCacheFiles()返回包含MYFILE的URI /路径数组。（例如：hdfs：//localhost/tmp/hadoopuser/mapred/staging/knappy/.staging/job_201208262359_0005/files/histfile#histfile）

不幸的是，当尝试在map任务中检索MYFILE时，它会抛出FileNotFoundException。

我在独立（本地）模式以及伪分布模式下尝试过此操作。

你知道原因可能是什么吗？

更新

以下三行：

System.out.println("cache files:"+ctx.getConfiguration().get("mapred.cache.files"));
uris = DistributedCache.getLocalCacheFiles(ctx.getConfiguration());
for(Path uri: uris){

      System.out.println(uri.toString());
      System.out.println(uri.getName());
      if(uri.getName().contains(Constants.PATH_TO_HISTFILE)){
       histfileName = uri.getName();
      }
}

打印出来：

cache files:file:/home/knappy/histfile#histfile

/tmp/hadoop-knappy/mapred/local/archive/-7231_-1351_105/file/home/knappy/histfile

histfile

因此，该文件似乎列在job.xml mapred.cache.files属性中，并且本地文件似乎存在。仍然，抛出FileNotFoundException。

Answer 1

首先检查作业xml中的mapred.cache.files，看看文件是否在缓存中。您可以在映射器中检索它：

...
Path[] files = DistributedCache.getLocalCacheFiles(context.getConfiguration());
File myFile = new File(files[0].getName());
//read your file content
...

DistributedCache Hadoop - FileNotFound

1 个答案: