处理火花状态istimingout

时间:2016-02-14 16:09:14

标签: scala apache-spark streaming state spark-streaming

我正在使用带有超时状态的spark streaming(1.6)中的新mapWithState函数。我想使用超时状态并将其添加到另一个rdd,以便在未来的路上进行计算:

val aggedlogs = sc.emptyRDD[MyLog];
val mappingFunc = (key: String, newlog: Option[MyLog], state: State[MyLog]) => {
val _newLog  =  newlog.getOrElse(null)
    if ((state.exists())&&(_newLog!=null))
    {
        val stateLog = state.get()
        val combinedLog = LogUtil.CombineLogs(_newLog, stateLog);
        state.update(combinedLog)                        
    }
    else if (_newLog !=null) {
        state.update(_newLog);
    }

    if (state.isTimingOut())
    {
        val stateLog = state.get();
        aggedlogs.union(sc.parallelize(List(stateLog), 1))
    }
    val  stateLog = state.get();
    (key,stateLog);            
}

val stateDstream = reducedlogs.mapWithState(StateSpec.function(mappingFunc).timeout(Seconds(10)))

但是当我尝试将它添加到StateSpec函数中的rdd时,我得到一个错误,该函数不可序列化。关于如何通过这个问题的任何想法?

编辑: 深入钻探后,我发现我的方法是错误的。在尝试此解决方案之前,我试图从statesnapeshot()获取超时日志,但它们不再存在,将映射功能更改为:

def  mappingFunc(key: String, newlog: Option[MyLog], state: State[KomoonaLog]) :  Option[(String, MyLog)] = {      
             val  _newLog  =  newlog.getOrElse(null)
                     if ((state.exists())&&(_newLog!=null))
                     {
                         val stateLog = state.get()
                                 val combinedLog = LogUtil.CombineLogs(_newLog, stateLog);
                         state.update(combinedLog)                           
                 Some(key,combinedLog);
                     }
                     else if (_newLog !=null) {
                         state.update(_newLog);
                          Some(key,_newLog);
                     }

             if (state.isTimingOut())
             {
                 val stateLog = state.get();
                  stateLog.timinigOut = true;   
                 System.out.println("timinigOut  : " +key );                     
                 Some(key, stateLog);                    
             } 
               val  stateLog = state.get();
              Some(key,stateLog);            

         }

我设法过滤mapedwithstatedstream以查找每批中超时的日志:

val stateDstream = reducedlogs.mapWithState(
    StateSpec.function(mappingFunc _).timeout(Seconds(60)))    

  val tiningoutlogs=   stateDstream.filter (filtertimingout)  

0 个答案:

没有答案