SparkUncaughtExceptionHandler的文档表明以下内容:
Executors的默认未捕获异常处理程序终止 整个过程,以避免无限期地陷入糟糕的状态。以来 执行者相对轻量级,最好快速失败 事情出错了。
是否可以覆盖此行为?
或者,在进入调用SparkUncaughtExceptionHandler的情况之前,是否有更好的方法来确定Spark是否不可用?
我的方案是我在Spark可能或可能不可用的环境中运行测试。如果Spark可用,我想运行测试。如果没有,我想跳过它们。
我正在尝试这样的事情:
object SparkUtility {
def Conf(url: String, app: String): SparkConf =
new SparkConf().setMaster(url).setAppName(app)
def Connect(conf: SparkConf): Try[SparkContext] =
Try(new SparkContext(conf))
}
class SampleTestSpec extends FlatSpec {
SparkUtility.Connect(SparkUtility.Conf("spark://HOST:7077", "appname")) match {
case Success(sc) => runAllTests(sc)
case Failure(f) => runOfflineTests()
}
// ...
}
不幸的是,当Spark处于脱机状态时,Failure情况永远不会匹配,应用程序终止于以下(客户端):
16/01/05 13:35:20 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkMaster@BADHOST:7077] has failed, address is now gated for
[5000] ms. Reason: [Association failed with [akka.tcp://sparkMaster@BADHOST:7077]] Caused by: [Connection refused: no further information: BADHOST/127.0.0.1:7077]
16/01/05 13:35:39 INFO AppClient$ClientEndpoint: Connecting to master spark://BADHOST:7077...
16/01/05 13:35:39 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[appclient-registration-retry-thread,5,main]
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@4ae4230 rejected from java.util.concurrent.ThreadPoolExecutor@4c7b5fb6[Run
ning, pool size = 1, active threads = 0, queued tasks = 0, completed tasks = 3]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:110)
at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1.apply(AppClient.scala:96)
at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1.apply(AppClient.scala:95)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at org.apache.spark.deploy.client.AppClient$ClientEndpoint.tryRegisterAllMasters(AppClient.scala:95)
at org.apache.spark.deploy.client.AppClient$ClientEndpoint.org$apache$spark$deploy$client$AppClient$ClientEndpoint$$registerWithMaster(AppClient.scala:1
21)
at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2$$anonfun$run$1.apply$mcV$sp(AppClient.scala:132)
at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1119)
at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2.run(AppClient.scala:124)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/01/05 13:35:39 INFO DiskBlockManager: Shutdown hook called
16/01/05 13:35:39 INFO ShutdownHookManager: Shutdown hook called
16/01/05 13:35:39 INFO ShutdownHookManager: Deleting directory ...
16/01/05 13:35:39 INFO ShutdownHookManager: Deleting directory ...
16/01/05 13:35:39 INFO ShutdownHookManager: Deleting directory ...
答案 0 :(得分:0)
这有点像黑客攻击,但您可以做的一件事就是尝试打开与主服务器的TCP连接。成功并不能保证您能够连接,但如果失败,那么主人肯定无法使用。
import java.net._
object SparkAvailabilityCheck {
// Probes the master by opening a socket. If that fails, the master is
// definitely not available at the given location, though non-failure is
// no guarantee that creating a SparkContext will succeed.
def isSparkOnline(masterLocation: URI): Boolean = {
try {
val host = InetAddress.getByName(masterLocation.getHost)
val socket = new Socket(host, masterLocation.getPort)
socket.close()
true
} catch {
case ex: ConnectException => false
}
}
}