我想使用YARN java API提交我的MR工作,我尝试像WritingYarnApplications那样做,但我不知道添加amContainer的内容,下面是我写的代码:
package org.apache.hadoop.examples;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.yarn.api.protocolrecords.GetNewApplicationResponse;
import org.apache.hadoop.yarn.api.records.ApplicationId;
import org.apache.hadoop.yarn.api.records.ApplicationSubmissionContext;
import org.apache.hadoop.yarn.api.records.ContainerLaunchContext;
import org.apache.hadoop.yarn.api.records.Resource;
import org.apache.hadoop.yarn.client.api.YarnClient;
import org.apache.hadoop.yarn.client.api.YarnClientApplication;
import org.apache.hadoop.yarn.util.Records;
import org.mortbay.util.ajax.JSON;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class YarnJob {
private static Logger logger = LoggerFactory.getLogger(YarnJob.class);
public static void main(String[] args) throws Throwable {
Configuration conf = new Configuration();
YarnClient client = YarnClient.createYarnClient();
client.init(conf);
client.start();
System.out.println(JSON.toString(client.getAllQueues()));
System.out.println(JSON.toString(client.getConfig()));
//System.out.println(JSON.toString(client.getApplications()));
System.out.println(JSON.toString(client.getYarnClusterMetrics()));
YarnClientApplication app = client.createApplication();
GetNewApplicationResponse appResponse = app.getNewApplicationResponse();
ApplicationId appId = appResponse.getApplicationId();
// Create launch context for app master
ApplicationSubmissionContext appContext = Records.newRecord(ApplicationSubmissionContext.class);
// set the application id
appContext.setApplicationId(appId);
// set the application name
appContext.setApplicationName("test");
// Set the queue to which this application is to be submitted in the RM
appContext.setQueue("default");
// Set up the container launch context for the application master
ContainerLaunchContext amContainer = Records.newRecord(ContainerLaunchContext.class);
//amContainer.setLocalResources();
//amContainer.setCommands();
//amContainer.setEnvironment();
appContext.setAMContainerSpec(amContainer);
appContext.setResource(Resource.newInstance(1024, 1));
appContext.setApplicationType("MAPREDUCE");
// Submit the application to the applications manager
client.submitApplication(appContext);
//client.stop();
}
}
我可以使用命令界面正确运行mapreduce作业:
hadoop jar wordcount.jar org.apache.hadoop.examples.WordCount /user/admin/input /user/admin/output/
但是如何在yarn java api中提交这个wordcount作业?
答案 0 :(得分:1)
您不使用Yarn Client提交作业,而是使用MapReduce API提交作业。 See this link for Example
但是,如果您需要对作业进行更多控制,例如获取完成状态,Mapper阶段状态,Reducer阶段状态等,您可以使用
job.submit();
而不是
job.waitForCompletion(true)
您可以使用函数job.mapProgress()和job.reduceProgress()来获取状态。您可以在工作对象中找到许多功能。
至于您的查询
hadoop jar wordcount.jar org.apache.hadoop.examples.WordCount /user/admin/input /user/admin/output/
这里发生的事情是你正在运行wordcount.jar中提供的驱动程序。而不是做" java -jar wordcount.jar "你正在使用" hadoop jar wordcount.jar "。你也可以使用" yarn jar wordcount.jar "。与java -jar命令相比,Hadoop / Yarn将设置必要的附加类路径。这将执行" main()"在命令中指定的org.apache.hadoop.examples.WordCount类中提供的驱动程序。
您可以在Source for WordCount class
查看来源我认为你想通过纱线提交工作的唯一原因是将它与某种服务相结合,这些服务可以在某些事件中启动MapReduce2工作。
为此你总是可以让你的驱动程序main()像这样。
public class MyMapReduceDriver extends Configured implements Tool {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
/******/
int errCode = ToolRunner.run(conf, new MyMapReduceDriver(), args);
System.exit(errCode);
}
@Override
public int run(String[] args) throws Exception {
while(true) {
try{
runMapReduceJob();
}
catch(IOException e)
{
e.printStackTrace();
}
}
}
private void runMapReduceJob() {
Configuration conf = new Configuration();
Job job = new Job(conf, "word count");
/******/
job.submit();
// Get status
while(job.getJobState()==RUNNING || job.getJobState()==PREP){
Thread.sleep(1000);
System.out.println(" Map: "+ StringUtils.formatPercent(job.mapProgress(), 0) + " Reducer: "+ StringUtils.formatPercent(job.reduceProgress(), 0));
}
}}
希望这有帮助。