我在Apache Spark中捕获自定义异常时遇到问题。
当我在这样的foreach循环中对数据集进行验证时
ds.foreach(
entry=> {
validate(entry)
})
当条目无效时,validate函数会抛出自定义异常。
但是在catch块中我无法捕获自定义异常,只抛出SparkException并且可以捕获:
case customException : CustomException =>
//is never catched
case exception : SparkException =>
//can be catched
我该如何处理?我需要捕获不同类型的异常,这些异常都是由validate方法抛出的。一种方法是读取包含原始异常的SparkException的消息,但这可能不是一个好的设计。
有什么想法吗?
答案 0 :(得分:1)
尝试匹配原因:
,而不是匹配基本异常import org.apache.spark.rdd.RDD
def ignoreArithmeticException(rdd: RDD[java.lang.Integer]) = try {
rdd.foreach(1 / _)
} catch {
case e: SparkException => e.getCause match {
case _: java.lang.ArithmeticException =>
println("Ignoring ArithmeticException")
case _ => throw e
}
}
这将是捕获:
Try(ignoreArithmeticException(sc.parallelize(Seq(0))))
00/00/00 00:00:00 ERROR Executor: Exception in task 3.0 in stage 35.0 (TID 143)
java.lang.ArithmeticException: / by zero
at
...
Ignoring ArithmeticException
res42: scala.util.Try[Unit] = Success(())
(虽然以非常冗长的方式),但不会抓住:
Try(ignoreArithmeticException(sc.parallelize(Seq(null))))
00/00/00 00:00:00 ERROR Executor: Exception in task 3.0 in stage 38.0 (TID 155)
java.lang.NullPointerException
at
...
res52: scala.util.Try[Unit] =
Failure(org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 38.0 failed 1 times, most recent failure: Lost task 3.0 in stage 38.0 (TID 155, localhost, executor driver): java.lang.NullPointerException ....