如何在Spark Scala中反序列化匿名类?

时间:2018-09-12 08:14:21

标签: scala apache-spark deserialization mariadb scala-reflect

我已经创建了ML模型,并使用kyro和Spark在 App1 中保存为MariaDB中的序列化对象。然后从另一个 App2 中读取它。以前它可以正常工作,但是后来我不得不更改代码并在 App1 中使用rdd.zipWithIndex。在那个kyro请求再注册两个类之后,我通过添加以下两行来解决了这个问题

classOf[scala.reflect.ClassTag$$anon$1],
classOf[java.lang.Class[_]]

尽管模型已正确保存(我可以在mariadb中看到它们),但是我的App2无法读取(反序列化)这些模型并给出以下IOException。

Exception in thread "main" java.io.IOException: java.lang.NullPointerException
    at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1207)
    at org.apache.spark.RangePartitioner.readObject(Partitioner.scala:227)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2176)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
    at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2176)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
    at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2176)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
    at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2176)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
    at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2176)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
    at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2176)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
    at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2176)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1973)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1565)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
    at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2176)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
    at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2176)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
    at se.uu.farmbio.vs.examples.StandalonePrediction$.loadModel(StandalonePrediction.scala:111)
    at se.uu.farmbio.vs.examples.StandalonePrediction$.run(StandalonePrediction.scala:83)
    at se.uu.farmbio.vs.examples.StandalonePrediction$$anonfun$main$1.apply(StandalonePrediction.scala:47)
    at se.uu.farmbio.vs.examples.StandalonePrediction$$anonfun$main$1.apply(StandalonePrediction.scala:46)
    at scala.Option.map(Option.scala:145)
    at se.uu.farmbio.vs.examples.StandalonePrediction$.main(StandalonePrediction.scala:46)
    at se.uu.farmbio.vs.examples.StandalonePrediction.main(StandalonePrediction.scala)
Caused by: java.lang.NullPointerException
    at org.apache.spark.RangePartitioner$$anonfun$readObject$1.apply$mcV$sp(Partitioner.scala:228)
    at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1204)
    ... 183 more

在这里我想提一下,我没有更改保存模型和加载模型代码,如下所示:

模型保存

private def insertModels(receptorPath: String, rModel: InductiveClassifier[MLlibSVM, LabeledPoint], rPdbCode: String, jdbcHostname: String) {
//Getting filename from Path and trimming the extension
val rName = FilenameUtils.removeExtension(Paths.get(receptorPath).getFileName.toString())
println("JOB_INFO: The value of rName is " + rName)

Class.forName("org.mariadb.jdbc.Driver")
val jdbcUrl = s"jdbc:mysql://" + jdbcHostname + ":3306/db_profile?user=root&password=blahblah"

//Preparation object for writing
val baos = new ByteArrayOutputStream()
val oos = new ObjectOutputStream(baos)
oos.writeObject(rModel)

val rModelAsBytes = baos.toByteArray()
val bais = new ByteArrayInputStream(rModelAsBytes)

val connection = DriverManager.getConnection(jdbcUrl)
if (!(connection.isClosed())) {
  //Writing to Database
  val sqlInsert: PreparedStatement = connection.prepareStatement("INSERT INTO MODELS(r_name, r_pdbCode, r_model) VALUES (?, ?, ?)")

  println("JOB_INFO: Start Serializing")

  // set input parameters
  sqlInsert.setString(1, rName)
  sqlInsert.setString(2, rPdbCode)
  sqlInsert.setBinaryStream(3, bais, rModelAsBytes.length)
  sqlInsert.executeUpdate()

  sqlInsert.close()
  println("JOB_INFO: Done Serializing")

} else {
  println("MariaDb Connection is Close")
  System.exit(1)
}

}

在App2中加载模型

def loadModel() = {
//Connection Initialization
Class.forName("org.mariadb.jdbc.Driver")
val jdbcUrl = s"jdbc:mysql://localhost:3306/db_profile?user=root&password=whatever"
val connection = DriverManager.getConnection(jdbcUrl)

//Reading Pre-trained model from Database
var model: InductiveClassifier[MLlibSVM, LabeledPoint] = null
if (!(connection.isClosed())) {

  val sqlRead = connection.prepareStatement("SELECT r_model FROM MODELS")
  val rs = sqlRead.executeQuery()
  rs.next()

  val modelStream = rs.getObject("r_model").asInstanceOf[Array[Byte]]
  val modelBaip = new ByteArrayInputStream(modelStream)
  val modelOis = new ObjectInputStream(modelBaip)
  model = modelOis.readObject().asInstanceOf[InductiveClassifier[MLlibSVM, LabeledPoint]]

  rs.close
  sqlRead.close
  connection.close()
} else {
  println("MariaDb Connection is Close")
  System.exit(1)
}
model

}

该错误来自模型加载行。当一切正常时,这并没有改变,我相信是zipWithIndex的原因,但我不知道如何解决。

modelOis.readObject().asInstanceOf[InductiveClassifier[MLlibSVM, LabeledPoint]]

App1中的新代码,导致Kyro要求另外两个类进行注册

private def getLabeledTopAndBottom(poses: RDD[String], dsSize: Int, topPer: Float, bottomPer: Float): RDD[String] = {
//what is top % of dsSize
val topN = Math.round((topPer / 100) * dsSize)
//logInfo("JOB_INFO: topN is " + topN)
 //what is bottom % of dsSize
val bottomN = Math.round((bottomPer / 100) * dsSize)
//logInfo("JOB_INFO: bottomN is " + bottomN)
 //Get scores with mols
val molAndScore = poses.map {
  case (mol) => (mol, PosePipeline.parseScore(mol))
}
 //Sort whole DsInit by Score
val top = molAndScore.sortBy { case (mol, score) => score }
  .zipWithIndex() 
  .filter { case ((mol, score), index) => index < topN }
  .map { case ((mol, score), index) => mol }

val bottom = molAndScore.sortBy { case (mol, score) => -score }
  .zipWithIndex()
  .filter { x => x._2 < bottomN }
  .map { case ((mol, score), index) => mol }

//Labeling the top molecules with 1.0
val labledTop = top.map { topMols =>
  labelPose(topMols, 1.0)
}
//Labeling the bottom molecules with 0.0
val labledBottom = bottom.map { bottomMols =>
  labelPose(bottomMols, 0.0)
}
 val topAndBottom = labledTop.union(labledBottom)

topAndBottom

}

0 个答案:

没有答案