在foreachRDD
内满足条件后停止流式上下文时遇到一些问题。只要执行scc.stop()
内部函数foo
,我就会收到中断错误。
简化代码:
def main(){
var sc = new SparkContext(new SparkConf().setAppName("appname").setMaster("local"))
foo(123,sc)
//foo(312,sc) can I call foo again here?
sc.stop()
}
def foo(param1: Integer, sc: SparkContext){
val ssc = new StreamingContext(sc, Seconds(1))
val res = 0
//dummy data, but actual datatypes (but is not relevant to the error I get in this code)
val inputData: mutable.Queue[RDD[Int]] = mutable.Queue()
val inputStream: InputDStream[Int] = ssc.queueStream(inputData)
inputData += sc.makeRDD(List(1, 2))
val rdds_list=some_other_fn(inputstream,param1) //returns DStream
rdds_list.foreachRDD((rdd) => {
def foo1(rdd: RDD[<some_type_2>]) = {
if (condition1) {
println("condition satisfied!") //prints correctly
res = do_stuff(rdd) //executes correctly
println("result: " + res) //executes correctly (and output is as intended)
}else{
println("stopping streaming context!")
ssc.stop(stopSparkContext = false) //error occurs here
}
}
foo(rdd)
})
ssc.start()
ssc.awaitTermination()
res
}
错误日志:
**condition satisfied!
result: 124124**
stopping streaming context!
[error] (pool-11-thread-1) java.lang.Error: java.lang.InterruptedException
java.lang.Error: java.lang.InterruptedException
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1155)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1252)
at java.lang.Thread.join(Thread.java:1326)
at org.apache.spark.util.AsynchronousListenerBus.stop(AsynchronousListenerBus.scala:160)
at org.apache.spark.streaming.scheduler.JobScheduler.stop(JobScheduler.scala:98)
at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:573)
at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:555)
at edu.gatech.cse8803.main.Main$$anonfun$testClustering$1.foo$1(Main.scala:315)
at edu.gatech.cse8803.main.Main$$anonfun$testClustering$1.apply(Main.scala:318)
at edu.gatech.cse8803.main.Main$$anonfun$testClustering$1.apply(Main.scala:306)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:534)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1.apply(DStream.scala:534)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:42)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:32)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:176)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:176)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:176)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:175)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
我尝试使用ssc.stop(stopSparkContext = true, stopGracefully = true)
,但我明白了:
WARN scheduler.JobGenerator -
Timed out while stopping the job generator (timeout = 10000)
在foo
被调用之后程序被卡住了(即它没有完成,我必须 Ctrl + c 它)。
这是停止流式上下文的正确方法吗?另外,如果我想多次拨打foo
,我是否应该进行任何更改?我了解应用程序中应该只有一个spark上下文,这就是我尝试重用它们的原因,或者我应该通过将SparkContext
设置为stopSparkContext
来关闭true
?
我的环境:
编辑:看了其他类似的问题,尝试了所有答案 - 仍然没有运气! :(
答案 0 :(得分:0)
它表明虽然spark驱动程序正在等待作业完成,但您正在关闭正在由StreamingContext处理的rdd_list
内的StreamingContext。这应该单独关闭。
并且不应该以这样的频率创建和关闭上下文。
我建议的是做以下事情......
启动并StreamingContext
从main()
传递到foo(...)
哪个会成为foo
def foo(param1: Integer, ssc: StreamingContext)
安全地关闭流应用程序的两个上下文就像......
sys.ShutdownHookThread {
//Executes when shutdown signal is received by the app
log.info("Gracefully stopping Spark Context")
sc.stop()
ssc.stop(true, true)
log.info("Application stopped")
}
但如果您需要使用程序化逻辑关闭,请使用StreamingContext
关闭SparkContext
。
这将使您的main()
看起来像
def main(){
var sc = new SparkContext(new SparkConf().setAppName("appname").setMaster("local"))
val ssc = new StreamingContext(sc, Seconds(1))
sys.ShutdownHookThread {
//Executes when shutdown signal is received by the app
log.info("Gracefully stopping Spark Context")
sc.stop()
ssc.stop(true, true)
log.info("Application stopped")
}
foo(123,ssc)
sc.stop()
ssc.stop
}