Question

用于Map Reducer作业

在我的输入目录中有大约1000个文件。每个文件都包含一些GB的数据。

例如/MyFolder/MyResults/in_data/20140710/包含1000个文件。

当我将输入路径设为/MyFolder/MyResults/in_data/20140710时，它将处理所有1000个文件。

我想通过一次只谈200个文件来完成一份工作。我们怎么做到这一点？

这是我要执行的命令：

hadoop jar wholefile.jar com.form1.WholeFileInputDriver -libjars myref.jar -D mapred.reduce.tasks=15 /MyFolder/MyResults/in_data/20140710/ <<Output>>

可以帮助我，如何像输入文件的批量大小一样运行作业。

提前致谢

-Vim

Answer 1

一种简单的方法是修改驱动程序，只将200个文件作为该目录中所有文件的输入。像这样：

FileSystem fs = FileSystem.get(new Configuration());
FileStatus[] files = fs.globStatus(new Path("/MyFolder/MyResults/in_data/20140710/*"));
for (int i=0;i<200;i++) {
    FileInputFormat.addInputPath(job, files[i].getPath());
}

Hadoop MapReduce：具有固定数量的输入文件？

1 个答案: