虽然
import play.api.libs.json._
case class Person(name: String, lovesPandas: Boolean)
implicit val personFormat = Json.format[Person]
val text = """{"name":"Sparky The Bear", "lovesPandas":true}"""
val jsonParse = Json.parse(text)
val result = Json.fromJson[Person](jsonParse)
result.get
使用Apache Toree内核在Jupyter笔记本上运行,
import org.apache.spark._
import play.api.libs.json._
import play.api.libs.functional.syntax._
case class Person(name: String, lovesPandas: Boolean)
implicit val personReads = Json.format[Person]
val text = """{"name":"Sparky The Bear", "lovesPandas":true}"""
val input = sc.parallelize(List(text))
val parsed = input.map(Json.parse(_))
val result = parsed.flatMap(record => {
personReads.reads(record).asOpt
})
result.filter(_.lovesPandas).map(Json.toJson(_)).saveAsTextFile("files/out/pandainfo.json")
返回
Name: org.apache.spark.SparkException
Message: Task not serializable
StackTrace: org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
[...]
我知道传递给其他节点的对象需要序列化,这似乎是不可能的。那么这个例子有问题还是我做错了什么?我该如何解决这个问题?
顺便说一下
import org.apache.spark._
import play.api.libs.json._
import play.api.libs.functional.syntax._
val text = """{"name":"Sparky The Bear", "lovesPandas":true}"""
case class Person(name: String, lovesPandas: Boolean)
val input = sc.parallelize(List(text))
val parsed = input.map(Json.parse(_))
val result = parsed.flatMap(record => {
implicit val personReads = Json.format[Person]
personReads.reads(record).asOpt
})
result.collect
将导致
Name: org.apache.spark.SparkException
Message: Job aborted due to stage failure: Task 3.0 in stage 0.0 (TID 3) had a not serializable result: play.api.libs.json.OFormat$$anon$1
Serialization stack:
- object not serializable (class: play.api.libs.json.OFormat$$anon$1, value:
[...]
我使用result.collect
来测试这部分代码是否正确。
另外,如果我写
result. filter(_.lovesPandas).map{Json.toJson(_)}.saveAsTextFile("files/out/pandainfo.json")
而不是result.collect
我得到了
Name: Compile Error
Message: <console>:166: error: No Json serializer found for type Person. Try to implement an implicit Writes or Format for this type.
Json.toJson(_)
^
StackTrace:
所以我想我必须声明Person
为Serializable
。但是,在extends Serializable
引发错误时,最后向其添加with Serializable
无效
Name: Compile Error
Message: <console>:2: error: ';' expected but 'with' found.
case class Person(name: String, lovesPandas: Boolean) with Serializable
^
答案 0 :(得分:0)
我会预感并说Json.format
返回的值不可序列化。
要解决此问题,您可以在flatMap
:
val result = parsed.flatMap(record => {
val personReads = Json.format[Person]
val jsValue = Json.parse(record)
personReads.reads(jsValue).asOpt
})
我认为导致问题的是Json.parse
返回JsValue
不可序列化的事实。
您可以将其缩小到一个map
:
sc
.parallelize(List(text))
.map(record => {
val personReads = Json.format[Person]
val jsValue = Json.parse(record)
personReads.reads(jsValue).asOpt
})
.filter(_.lovesPandas)
.map(Json.toJson(_).toString)
.saveAsTextFile("files/out/pandainfo.json")