如何在Spark客户端中覆盖SparkUncaughtExceptionHandler的默认行为?

时间:2016-01-05 19:11:37

标签: apache-spark

SparkUncaughtExceptionHandler的文档表明以下内容:

  

Executors的默认未捕获异常处理程序终止   整个过程,以避免无限期地陷入糟糕的状态。以来   执行者相对轻量级,最好快速失败   事情出错了。

是否可以覆盖此行为?

或者,在进入调用SparkUncaughtExceptionHandler的情况之前,是否有更好的方法来确定Spark是否不可用?

我的方案是我在Spark可能或可能不可用的环境中运行测试。如果Spark可用,我想运行测试。如果没有,我想跳过它们。

我正在尝试这样的事情:

object SparkUtility {
  def Conf(url: String, app: String): SparkConf =
    new SparkConf().setMaster(url).setAppName(app)

  def Connect(conf: SparkConf): Try[SparkContext] =
    Try(new SparkContext(conf))
}

class SampleTestSpec extends FlatSpec  {

  SparkUtility.Connect(SparkUtility.Conf("spark://HOST:7077", "appname")) match {
    case Success(sc) => runAllTests(sc)
    case Failure(f) => runOfflineTests()
  }

  // ...
}

不幸的是,当Spark处于脱机状态时,Failure情况永远不会匹配,应用程序终止于以下(客户端):

16/01/05 13:35:20 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkMaster@BADHOST:7077] has failed, address is now gated for
[5000] ms. Reason: [Association failed with [akka.tcp://sparkMaster@BADHOST:7077]] Caused by: [Connection refused: no further information: BADHOST/127.0.0.1:7077]
16/01/05 13:35:39 INFO AppClient$ClientEndpoint: Connecting to master spark://BADHOST:7077...
16/01/05 13:35:39 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[appclient-registration-retry-thread,5,main]
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@4ae4230 rejected from java.util.concurrent.ThreadPoolExecutor@4c7b5fb6[Run
ning, pool size = 1, active threads = 0, queued tasks = 0, completed tasks = 3]
        at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
        at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:110)
        at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1.apply(AppClient.scala:96)
        at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1.apply(AppClient.scala:95)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
        at org.apache.spark.deploy.client.AppClient$ClientEndpoint.tryRegisterAllMasters(AppClient.scala:95)
        at org.apache.spark.deploy.client.AppClient$ClientEndpoint.org$apache$spark$deploy$client$AppClient$ClientEndpoint$$registerWithMaster(AppClient.scala:1
21)
        at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2$$anonfun$run$1.apply$mcV$sp(AppClient.scala:132)
        at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1119)
        at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2.run(AppClient.scala:124)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
16/01/05 13:35:39 INFO DiskBlockManager: Shutdown hook called
16/01/05 13:35:39 INFO ShutdownHookManager: Shutdown hook called
16/01/05 13:35:39 INFO ShutdownHookManager: Deleting directory ...
16/01/05 13:35:39 INFO ShutdownHookManager: Deleting directory ...
16/01/05 13:35:39 INFO ShutdownHookManager: Deleting directory ...

1 个答案:

答案 0 :(得分:0)

这有点像黑客攻击,但您可以做的一件事就是尝试打开与主服务器的TCP连接。成功并不能保证您能够连接,但如果失败,那么主人肯定无法使用。

import java.net._

object SparkAvailabilityCheck {
  // Probes the master by opening a socket. If that fails, the master is
  // definitely not available at the given location, though non-failure is
  // no guarantee that creating a SparkContext will succeed.
  def isSparkOnline(masterLocation: URI): Boolean = {
    try {
      val host = InetAddress.getByName(masterLocation.getHost)
      val socket = new Socket(host, masterLocation.getPort)
      socket.close()
      true
    } catch {
      case ex: ConnectException => false
    }
  }
}