在Spark Streaming中执行此代码时,出现了序列化错误(见下文):
val endPoint = ConfigFactory.load("application.conf").getConfig("conf").getString("endPoint")
val operation = ConfigFactory.load("application.conf").getConfig("conf").getString("operation")
val param = ConfigFactory.load("application.conf").getConfig("conf").getString("param")
result.foreachRDD{jsrdd =>
jsrdd.map(jsobj => {
val docId = (jsobj \ "id").as[JsString].value
val response: HttpResponse[String] = Http(apiURL + "/" + endPoint + "/" + docId + "/" + operation).timeout(connTimeoutMs = 1000, readTimeoutMs = 5000).param(param,jsobj.toString()).asString
val output = Json.parse(response.body) \ "annotation" \ "tags"
jsobj.as[JsObject] + ("tags", output.as[JsObject])
})}
所以,据我所知,问题在于scalaj Http api。我该如何解决这个问题?显然我无法改变api。
java.io.NotSerializableException:DStream检查点已经存在 启用但DStreams及其功能不可序列化 org.consumer.kafka.KafkaJsonConsumer 序列化堆栈: - 对象不可序列化(类:org.consumer.kafka.KafkaJsonConsumer, 值: org.consumer.kafka.KafkaJsonConsumer@f91da5e) - field(类:org.consumer.kafka.KafkaJsonConsumer $$ anonfun $ run $ 1, name:$ outer,type:class org.consumer.kafka.KafkaJsonConsumer) - 对象(类org.consumer.kafka.KafkaJsonConsumer $$ anonfun $ run $ 1, ) - field(类:org.apache.spark.streaming.dstream.DStream $$ anonfun $ foreachRDD $ 1 $$ anonfun $ apply $ mcV $ sp $ 3, name:cleaningF $ 1,输入:interface scala.Function1) - object(类org.apache.spark.streaming.dstream.DStream $$ anonfun $ foreachRDD $ 1 $$ anonfun $ apply $ mcV $ sp $ 3, ) - writeObject数据(类:org.apache.spark.streaming.dstream.DStream) - object(类org.apache.spark.streaming.dstream.ForEachDStream,org.apache.spark.streaming.dstream.ForEachDStream@761956ac) - writeObject数据(类:org.apache.spark.streaming.dstream.DStreamCheckpointData) - object(类org.apache.spark.streaming.dstream.DStreamCheckpointData,[0 检查点文件
]) - writeObject数据(类:org.apache.spark.streaming.dstream.DStream) - object(类org.apache.spark.streaming.dstream.ForEachDStream,org.apache.spark.streaming.dstream.ForEachDStream@704641e3) - 数组元素(索引:0) - array(class [Ljava.lang.Object;,size 16) - field(类:scala.collection.mutable.ArrayBuffer,name:array,type:class [Ljava.lang.Object;) - object(类scala.collection.mutable.ArrayBuffer,ArrayBuffer(org.apache.spark.streaming.dstream.ForEachDStream@704641e3, org.apache.spark.streaming.dstream.ForEachDStream@761956ac)) - writeObject数据(类:org.apache.spark.streaming.dstream.DStreamCheckpointData) - object(类org.apache.spark.streaming.dstream.DStreamCheckpointData,[0 检查点文件
])