Question

我创建了运行mapReduce的jar并在某个目录中生成输出。我需要从我的java代码中输出dir的输出数据中读取数据，这些数据不在hadooop环境中运行而不将其复制到本地目录中。我正在使用ProcessBuilder来运行Jar.can，任何一个人都帮帮我.. ??

Answer 1

您可以编写以下代码来读取MR驱动程序代码中作业的输出。

    job.waitForCompletion(true);
    FileSystem fs = FileSystem.get(conf);
    Path[] outputFiles = FileUtil.stat2Paths(fs.listStatus(output,new  OutputFilesFilter()));

        for (Path file : outputFiles ) {
            InputStream is = fs.open(file);
            BufferedReader reader = new BufferedReader(new InputStreamReader(is));
            ---
            ---
        }

Answer 2

使用HDFS API读取HDFS数据有什么问题？

public static void main(String[] args) throws IOException {
        // TODO Auto-generated method stub

        Configuration conf = new Configuration();
        conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/core-site.xml"));
        conf.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/hdfs-site.xml"));
        FileSystem fs = FileSystem.get(conf);
        FSDataInputStream inputStream = fs.open(new Path("/mapout/input.txt"));
        System.out.println(inputStream.readLine());     
    }

你的程序可能已经用完你的hadoop集群，但必须运行hadoop守护进程。

如何从hadoop获取输出数据？

2 个答案: