Question

我有如下要求：

有一个30节点的hadoop YARN集群，以及一个用于作业提交的客户机。
让我们使用wordcount MR示例，因为它世界闻名。我想从java方法提交并运行wordcount MR作业。

那么提交作业所需的代码是什么？客户端计算机上配置的任何特定内容？

Answer 1

Hadoop应该出现在客户端计算机上，其配置与hadoop集群中的其他计算机相同。

要从java方法提交MR作业，请参阅java Fastest way to fill DataTable from LINQ query using DataContext并传递hadoop命令以启动wordcount示例。

可以找到wordcount的命令和必要的应用程序特定要求ProcessBuilder

Answer 2

你应该创建一个实现Tool的类。这里有一个例子：

public class AggregateJob extends Configured implements Tool {

  @Override
  public int run(String[] args) throws Exception {
    Job job = new Job(getConf());
    job.setJarByClass(getClass());
    job.setJobName(getClass().getSimpleName());

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    job.setMapperClass(ProjectionMapper.class);
    job.setCombinerClass(LongSumReducer.class);
    job.setReducerClass(LongSumReducer.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(LongWritable.class);

    return job.waitForCompletion(true) ? 0 : 1;
  }

  public static void main(String[] args) throws Exception {
    int rc = ToolRunner.run(new AggregateJob(), args);
    System.exit(rc);
  }
}

此示例来自here。正如@ hamsa-zafar已经说过的那样，客户端机器应该具有hadoop配置，就像群集中的任何其他节点一样。

如何在Java代码中运行hadoop yarn上的字计数作业？

2 个答案: