运行map reduce作业以从Yarn上的hbase表中删除数据时作业失败

时间:2017-08-22 15:55:52

标签: hadoop mapreduce hbase yarn

我正在使用org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil来删除Hbase表中的数据。写了一个主类(RollbackHandler)并从那里开始工作:

 def main(args: Array[String]) {
    val config = HBaseConfiguration.create()
    val job = new Job(config, "RollbackHandler")
    job.setJarByClass(classOf[RollBackMapper])
    //doing some creating filter related stuff,
    //creating scan etc.
    //......
    //.....

    TableMapReduceUtil.initTableMapperJob(tableName, scan, classOf[RollBackMapper], null, null, job)
        job.setOutputFormatClass(classOf[NullOutputFormat[_ <: Writable, _ <: Writable]])
        job.setNumReduceTasks(0)

        logger.info("Starting RollbackHandler job for HBASE table: " + tableName)
        val status = job.waitForCompletion(true)
        exitStatus = if (status) 0 else 1

}

现在运行如下:

java -classpath /opt/reflex/opt/tms/java/crux2.0-care1.0-jar-with-dependencies.jar:/opt/reflex/opt/tms/java/care-insta-api。 jar:/ opt / reflex / opt / tms / java / :/ opt / reflex / opt / tms / java / care-acume-war / WEB-INF / lib / RollbackHandler(fully_qualified_name_of_class)

在本地模式下启动mapreduce作业时运行正常。要在纱线上运行,请在main()方法中添加以下行:

config.set("mapreduce.framework.name", "yarn")
config.addResource(new Path("/opt/hadoop/conf/hdfs-site.xml"))
config.addResource(new Path("/opt/hadoop/conf/mapred-site.xml"))
config.addResource(new Path("/opt/hadoop/conf/yarn-site.xml"))

运行此选项时,应用程序在纱线上启动但失败并出现以下错误:

诊断:
应用程序application_502881193709_0090由于AM容器而失败了2次appattempt_1502881193709_0090_000002退出,退出时使用exitCode:-1000 有关更详细的输出,请查看应用程序跟踪页面:http://RPM-VIP:8088/cluster/app/application_1502881193709_0090Then,单击每个尝试的日志链接。 诊断:java.io.IOException:src文件系统上的资源文件:/opt/reflex/opt/tms/java/crux2.0-care1.0-jar-with-dependencies.jar已更改(预计1476799531000,为1476800106000

未通过此尝试。申请失败。

我认为这是一个类路径问题,因此创建了所有jar的存档并在main方法中添加了以下行: job.addArchiveToClassPath(new Path(&#34; /opt/reflex/jar_archive.tar.gz"))

但仍然应用程序失败并出现同样的错误。 有人可以帮忙吗?非常感谢您的帮助!

谢谢, 苏雷什

1 个答案:

答案 0 :(得分:0)

添加所有xml,在hadoop conf dir中可用:

config.addResource(new Path("/opt/hadoop/conf/hdfs-site.xml"))
config.addResource(new Path("/opt/hadoop/conf/mapred-site.xml"))
config.addResource(new Path("/opt/hadoop/conf/core-site.xml"))
config.addResource(new Path("/opt/hadoop/conf/yarn-site.xml"))
config.addResource(new Path("/opt/hadoop/conf/capacity-scheduler.xml"))
config.addResource(new Path("/opt/hadoop/conf/hadoop-policy.xml"))

另外,将habse-site.xml复制到hadoop classpath并重新启动yarn。添加了hbase-site.xml以进行配置,如下所示:

config.addResource(new Path(“/ opt / hadoop / conf / hbase-site.xml”))

将.properties文件添加到作业对象,如下所示:

job.addFileToClassPath(new Path("/opt/hadoop/conf/hadoop-metrics2.properties"))
job.addFileToClassPath(new Path("/opt/hadoop/conf/hadoop-metrics.properties"))
job.addFileToClassPath(new Path("/opt/hadoop/conf/httpfs-log4j.properties"))
job.addFileToClassPath(new Path("/opt/hadoop/conf/log4j.properties"))

此路径也是从hdfs读取的,因此请确保上面使用的“/ opt / hadoop / conf”是hdfs路径。我将/ opt / hadoop / conf从本地文件系统复制到hdfs。之后,Job在纱线上成功运行。