这是一个有效的代码示例:
case class log(log_version: String,log_ip: String,log_from: String,SDK: String,action_time: String,action: String,sn: String,fdn: String,typ: String,vid: String,version: String,device_id: String,ip: String,timestamp: String) extends serializable
val RDD = input.map{ line =>
val p = line.split("\\|")
val log_version = p(0)
val log_ip = p(1)
val log_from = p(2)
val SDK = p(3)
val action_time = p(4)
val action = p(5)
val sn = p(6)
val JsonMap = if(p.length==8){
val jsontest = parse(p(7), useBigDecimalForDouble = true)
jsontest.extract[Map[String,String]]
} else(Map("error" -> "empty"))
val fdn:String = JsonMap.get("fdn").getOrElse("null")
val typ:String = JsonMap.get("type").getOrElse("null")
val vid:String = JsonMap.get("vid").getOrElse("null")
val version:String = JsonMap.get("version").getOrElse("null")
val device_id:String = JsonMap.get("device_id").getOrElse("null")
val ip:String = JsonMap.get("ip").getOrElse("null")
val timestamp:String = JsonMap.get("timestamp").getOrElse("null")
log(log_version,log_ip,log_from,SDK,action_time,action,sn,fdn,typ,vid,version,device_id,ip,timestamp)}.toDF()
每当我尝试访问sc时,我都会收到以下错误。我在这里做错了什么?
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
我改变了我的代码:
case class JsonLong(fdn:String,typ:String,vid:String,version:String,device_id:String,ip:String,timestamp:String)
case class log(log_version: String,log_ip: String,log_from: String,SDK: String,action_time: String,action: String,sn: String,JsonClass:JsonLong) extends serializable
val RDD = input.map{ line =>
val p = line.split("\\|")
val log_version = p(0)
val log_ip = p(1)
val log_from = p(2)
val SDK = p(3)
val action_time = p(4)
val action = p(5)
val sn = p(6)
val JsonMap:JsonLong = if(p.length==8){
val jsontest = parse(p(7), useBigDecimalForDouble = true)
val x = jsontest.extract[Map[String,String]]
JsonLong(x.get("fdn").getOrElse("NULL"),x.get("typ").getOrElse("NULL"),x.get("vid").getOrElse("NULL"),x.get("version").getOrElse("NULL"),x.get("fdn").getOrElse("NULL"),x.get("ip").getOrElse("NULL"),x.get("timestamp").getOrElse("NULL"))
} else(null)
log(log_version,log_ip,log_from,SDK,action_time,action,sn,JsonMap)}.toDF()
但我还是错了?为什么?我不明白~~~任何可以告诉我的人?
答案 0 :(得分:2)
Spark需要能够序列化闭包以将其发送给每个执行程序。作为对代码中不能进行serlialized的猜测,看起来您正在使用需要implicit Formats
来提取Map[String, String]
的json4s。尝试在map函数中声明隐式。