应用错误收集

使用MapReduce读取目录中的文件

时间：2017-02-13 08:48:35

标签： amazon-web-services amazon-s3 mapreduce emr amazon-emr

我的S3目录是

/sssssss/xxxxxx/rrrrrr/xx/file1
/sssssss/xxxxxx/rrrrrr/xx/file2
/sssssss/xxxxxx/rrrrrr/xx/file3
/sssssss/xxxxxx/rrrrrr/yy/file4
/sssssss/xxxxxx/rrrrrr/yy/file5
/sssssss/xxxxxx/rrrrrr/yy/file6

我的mapreduce程序如何在S3上读取这些文件？

2 个答案:

答案 0 :(得分：0)

对于一个输入路径，您可以执行以下操作：

FileInputFormat.addInputPath(job, new Path("/sssssss/xxxxxx/rrrrrr/xx/"));

对于两个输入路径，请执行以下操作：

FileInputFormat.addInputPath(job, new Path("/sssssss/xxxxxx/rrrrrr/xx/"));
FileInputFormat.addInputPath(job, new Path("/sssssss/xxxxxx/rrrrrr/yy/"));

或使用addInputPaths()。有关更多详细信息，请参阅the documentation of FileInputPath（取决于您的Hadoop版本）。

答案 1 :(得分：0)

可以通过以下方式简化： -

FileInputFormat.setInputDirRecursive(job, true);
FileInputFormat.addInputPaths(conf, args[0]);

您只需要提供s3目录的基本路径，而不是每个文件的确切位置。它将转到包含文件的最后一个目录。