如何以错误结束或失败AWS Glue作业?

时间:2018-01-30 10:02:34

标签: scala amazon-web-services aws-glue

考虑一下AWS Glue工作片段:

val input = glueContext
  .getCatalogSource(database = "my_db", tableName = "my_table")
  .getDynamicFrame()
val myLimit = 10    
if (input.count() <= myLimit) {
  // end glue job here with error
}
// continue execution

如何以错误状态退出作业?如果我只是跳过执行,它只会以成功结束;如果我抛出异常,它会因异常而失败。我可以调用一些东西来阻止失败/错误状态的作业,但不会抛出异常吗?

已更新

乍一看,我可以:

val spark: SparkContext = SparkContext.getOrCreate()
val glueContext: GlueContext = new GlueContext(spark)
val jobId = GlueArgParser.getResolvedOptions(sysArgs, Seq("JOB_ID").toArray)("JOB_ID")
spark.cancelJob(jobId)

可是:

  1. SparkContext来自内部框架,结束工作可能导致不可预测(不稳定)的结果。
  2. org.apache.spark.SparkContext#cancelJob收到Int,而AWS Glue有String JOB_ID,如下所示:j_aaa11111a1a11a111a1aaa11a11111aaa11a111a1111111a111a1a1aa111111a。因此无法直接传递给cancelJob

1 个答案:

答案 0 :(得分:0)

这是我所知道的,写为pyspark

args = getResolvedOptions(
   sys.argv, ["TempDir", "JOB_NAME"]
)
job = Job(glue_context)
job.init(args["JOB_NAME"], args)

if my_check() == False:
    # you can use any other exit code and glue will still report failure
    # because the job is not committed
    sys.exit(0)

do_normal_stuff()
job.commit()

Spark作业和胶粘作业是不同的,这就是为什么您不能互换其ID的原因。