Question

我正在使用带有超时状态的spark streaming（1.6）中的新mapWithState函数。我想使用超时状态并将其添加到另一个rdd，以便在未来的路上进行计算：

val aggedlogs = sc.emptyRDD[MyLog];
val mappingFunc = (key: String, newlog: Option[MyLog], state: State[MyLog]) => {
val _newLog  =  newlog.getOrElse(null)
    if ((state.exists())&&(_newLog!=null))
    {
        val stateLog = state.get()
        val combinedLog = LogUtil.CombineLogs(_newLog, stateLog);
        state.update(combinedLog)                        
    }
    else if (_newLog !=null) {
        state.update(_newLog);
    }

    if (state.isTimingOut())
    {
        val stateLog = state.get();
        aggedlogs.union(sc.parallelize(List(stateLog), 1))
    }
    val  stateLog = state.get();
    (key,stateLog);            
}

val stateDstream = reducedlogs.mapWithState(StateSpec.function(mappingFunc).timeout(Seconds(10)))

但是当我尝试将它添加到StateSpec函数中的rdd时，我得到一个错误，该函数不可序列化。关于如何通过这个问题的任何想法？

编辑：深入钻探后，我发现我的方法是错误的。在尝试此解决方案之前，我试图从statesnapeshot（）获取超时日志，但它们不再存在，将映射功能更改为：

def  mappingFunc(key: String, newlog: Option[MyLog], state: State[KomoonaLog]) :  Option[(String, MyLog)] = {      
             val  _newLog  =  newlog.getOrElse(null)
                     if ((state.exists())&&(_newLog!=null))
                     {
                         val stateLog = state.get()
                                 val combinedLog = LogUtil.CombineLogs(_newLog, stateLog);
                         state.update(combinedLog)                           
                 Some(key,combinedLog);
                     }
                     else if (_newLog !=null) {
                         state.update(_newLog);
                          Some(key,_newLog);
                     }

             if (state.isTimingOut())
             {
                 val stateLog = state.get();
                  stateLog.timinigOut = true;   
                 System.out.println("timinigOut  : " +key );                     
                 Some(key, stateLog);                    
             } 
               val  stateLog = state.get();
              Some(key,stateLog);            

         }

我设法过滤mapedwithstatedstream以查找每批中超时的日志：

val stateDstream = reducedlogs.mapWithState(
    StateSpec.function(mappingFunc _).timeout(Seconds(60)))    

  val tiningoutlogs=   stateDstream.filter (filtertimingout)

处理火花状态istimingout

0 个答案: