120秒内无法收到任何回复。此超时由spark.rpc.askTimeout控制

时间:2017-05-27 03:31:59

标签: spark-streaming

我正在学习有关火花流的知识,我有一个旨在找到前5个单词的程序。

import org.apache.spark.streaming.StreamingContext
import org.apache.spark.SparkConf
import org.apache.spark.streaming.Seconds

object Top5{
   def main(args:Array[String]){
  val conf=new SparkConf()
  conf.setAppName("AppName")
  conf.setMaster("spark://SparkMaster:7077")
  val ssc=new StreamingContext(conf,Seconds(10))
  val hottestStream=ssc.socketTextStream("SparkMaster:7077", 9999)
  val searchPair=hottestStream.map(_.split("")(1)).map(item=>(item,1))
  val hottestDStream=searchPair.reduceByKeyAndWindow((v1:Int,v2:Int)=>v1+v2,Seconds(60),Seconds(20))
  hottestDStream.transform(hottestItemRDD=>{
    val top5=hottestItemRDD.map(pair=>(pair._2,pair._1)).sortByKey(false)
                 .map(pair=>(pair._2,pair._1)).take(3) 
     for(item<-top5){
       println(item)
     }
    hottestItemRDD}
  ).print()
  ssc.start()
  ssc.awaitTermination()
}}

当我在spark集群环境中执行它时,错误显示

Cannot receive any reply in 120 seconds. This timeout is controlled by spark.rpc.askTimeout

我在stackoverflow中搜索了我的问题。还有一个类似的问题org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 seconds]. This timeout is controlled by spark.rpc.lookupTimeout答案告诉我增加spark.timeout.network,是吗?此外,我在哪里可以找到spark.timeout.network

1 个答案:

答案 0 :(得分:2)

对于繁重的工作负载,建议将spark.network.timeout增加到800秒:

--conf spark.network.timeout=800

请参阅https://developer.ibm.com/hadoop/2016/07/18/troubleshooting-and-tuning-spark-for-heavy-workloads/