我想序列化一个Scalding TypedPipe[MyClass]
并在Spark 1.5.1中对其进行去序列化。
我能够使用kryo和Twitter的Chill for Scala序列化/反序列化一个只包含“原语”的“简单”案例类,例如布尔和地图:
//In Scalding
case class MyClass(val foo: Boolean) extends Serializable {}
val data = ... //TypedPipe[MyClass]
def serialize[A](data: A) = {
val instantiator = new ScalaKryoInstantiator
instantiator.setRegistrationRequired(false)
val kryo = instantiator.newKryo()
val bao = new ByteArrayOutputStream
val output = new Output(bao)
kryo.writeObject(output, data)
output.close
bao.toByteArray()
}
data.map(t => (NullWritable.get, new BytesWritable(serialize(t))))
.write(WritableSequenceFile(outPath))
//In Spark:
def deserialize[A](ser: Array[Byte], clazz: Class[A]): A = {
val instantiator = new ScalaKryoInstantiator
instantiator.setRegistrationRequired(false)
val kryo = instantiator.newKryo()
val input = new Input(new ByteArrayInputStream(ser))
val deserData = kryo.readObject(input, clazz)
deserData
}
sc.sequenceFile(inPath, classOf[NullWritable], classOf[BytesWritable]).map(_._2)
.map(t => deserialize(t.get, classOf[MyClass])) //where 'sc' is SparkContext
我还能够序列化/反序列化一个“复杂”类,其中包含由我编写的其他自定义类的成员(例如org.joda.time.LocalDate
)。我在序列化和反序列化期间按照Kryo文档中提到的顺序注册类,使用kryo的默认Serializer:
//In Scalding
class MyClass2(val bar: MyClass, val someDate: LocalDate) extends Serializable {}
def serialize[A](data: A) = {
val instantiator = new ScalaKryoInstantiator
instantiator.setRegistrationRequired(false)
val kryo = instantiator.newKryo()
kryo.register(classOf[MyClass2])
kryo.register(classOf[MyClass])
kryo.register(classOf[LocalDate])
kryo.register(classOf[ISOChronology])
kryo.register(classOf[GregorianChronology])
val bao = new ByteArrayOutputStream
val output = new Output(bao)
kryo.writeObject(output, data)
output.close
bao.toByteArray()
}
//In Spark
def deserialize[A](ser: Array[Byte], clazz: Class[A]): A = {
val instantiator = new ScalaKryoInstantiator
instantiator.setRegistrationRequired(false)
val kryo = instantiator.newKryo()
kryo.register(classOf[MyClass2])
kryo.register(classOf[MyClass])
kryo.register(classOf[LocalDate])
kryo.register(classOf[ISOChronology])
kryo.register(classOf[GregorianChronology])
val input = new Input(new ByteArrayInputStream(ser))
val deserData = kryo.readObject(input, clazz)
deserData
}
a)如上所述,这有效,但似乎过于冗长。我错过了一种更简单的方法吗?
b)当我只注册LocalDate时,Spark抱怨它没有“知道”ISOChronology。当我注册ISOChronology时,它抱怨它不知道GregorianChronology。我注册了GregorianChronology并且Spark停止了抱怨并且一切正常。有没有办法注册LocalDate“及其中的所有内容”?