在Apache Spark中捕获自定义异常

时间:2018-02-13 23:24:42

标签: apache-spark exception

我在Apache Spark中捕获自定义异常时遇到问题。

当我在这样的foreach循环中对数据集进行验证时

ds.foreach(
        entry=> {
          validate(entry)
        })

当条目无效时,validate函数会抛出自定义异常。

但是在catch块中我无法捕获自定义异常,只抛出SparkException并且可以捕获:

case customException : CustomException =>
    //is never catched
case exception : SparkException =>
    //can be catched

我该如何处理?我需要捕获不同类型的异常,这些异常都是由validate方法抛出的。一种方法是读取包含原始异常的SparkException的消息,但这可能不是一个好的设计。

有什么想法吗?

1 个答案:

答案 0 :(得分:1)

尝试匹配原因:

,而不是匹配基本异常
import org.apache.spark.rdd.RDD

def ignoreArithmeticException(rdd: RDD[java.lang.Integer]) = try {
  rdd.foreach(1 / _)
} catch {
  case e: SparkException => e.getCause match  {
    case _: java.lang.ArithmeticException => 
      println("Ignoring ArithmeticException")
    case _ => throw e
  }
}

这将是捕获:

Try(ignoreArithmeticException(sc.parallelize(Seq(0))))
00/00/00 00:00:00 ERROR Executor: Exception in task 3.0 in stage 35.0 (TID 143)
java.lang.ArithmeticException: / by zero
    at
    ...
Ignoring ArithmeticException
res42: scala.util.Try[Unit] = Success(())

(虽然以非常冗长的方式),但不会抓住:

Try(ignoreArithmeticException(sc.parallelize(Seq(null))))
00/00/00 00:00:00 ERROR Executor: Exception in task 3.0 in stage 38.0 (TID 155)
java.lang.NullPointerException
    at 
   ...
res52: scala.util.Try[Unit] =
Failure(org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 38.0 failed 1 times, most recent failure: Lost task 3.0 in stage 38.0 (TID 155, localhost, executor driver): java.lang.NullPointerException ....