Question

我正在尝试将缓存文件添加到我的工作中。我在s3 bucket目录中分割文件。 “s3n：pathSomthing” 我想在Mapper类的设置中导入它们。

我在main中尝试了这段代码：

 job.addCacheFile(new URI(args[1])); //path of the s3 with the files

在Mapper中，在我使用的安装程序中：

        protected void setup(Context context) throws IOException, InterruptedException {

        Configuration conf = context.getConfiguration();
        FileSystem fs = FileSystem.get(conf);
        System.out.println("entring setup");
        URI [] cacheFiles = context.getCacheFiles();
        if((cacheFiles != null) && (cacheFiles.length >0)) {
            for (URI cacheFile : cacheFiles) {
                Path path = new Path(cacheFile.getPath().toString());
                   if (fs.exists(path)) {
                FSDataInputStream in = fs.open(path);
                readFile(in);
                in.close();
                   }
            }
        }

打开和阅读整个文件的正确方法是什么？

谢谢！

将s3存储桶中的文件导入hadoop程序（在java，eclipse中）

0 个答案: