无法在spark中找到proto缓冲类

时间:2016-11-30 10:06:06

标签: apache-spark buffer protocols

我最近学习了spark,我遇到了一个关于potocol缓冲区的问题,我在spark中运行了下面的代码并得到了#34; java.lang.RuntimeException:无法找到proto buffer class"。

object BasicSaveProtoBuf {
  def main(args: Array[String]) {
    val master = args(0)
    val outputFile = args(1)
    val sc = new SparkContext(master, "BasicSaveProtoBuf", System.getenv("SPARK_HOME"))
    val conf = new Configuration()
    LzoProtobufBlockOutputFormat.setClassConf(classOf[Places.Venue], conf);
    val dnaLounge = Places.Venue.newBuilder()
    dnaLounge.setId(1);
    dnaLounge.setName("DNA Lounge")
    dnaLounge.setType(Places.Venue.VenueType.CLUB)
    val data = sc.parallelize(List(dnaLounge.build()))
    val outputData = data.map{ pb =>
      val protoWritable = ProtobufWritable.newInstance(classOf[Places.Venue]);
      protoWritable.set(pb)
      (null, protoWritable)
    }
    outputData.saveAsNewAPIHadoopFile(outputFile, classOf[Text], classOf[ProtobufWritable[Places.Venue]],
      classOf[LzoProtobufBlockOutputFormat[ProtobufWritable[Places.Venue]]], conf)
  }
}

places.proto是

message Venue {
    required int32 id = 1;
    required string name = 2;
    required VenueType type = 3;
    optional string address = 4;

enum VenueType {
    COFFEESHOP = 0;
    WORKPLACE = 1;
    CLUB = 2;
    OMNOMNOM = 3;
    OTHER = 4;
}
}

例外日志是:

16/11/30 09:34:27 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.io.IOException: java.lang.RuntimeException: Unable to find proto buffer class
    at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1141)
    at org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Unable to find proto buffer class
    at com.google.protobuf.GeneratedMessageLite$SerializedForm.readResolve(GeneratedMessageLite.java:775)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1104)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1807)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
    at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:500)
    at org.apache.spark.rdd.ParallelCollectionPartition$$anonfun$readObject$1.apply$mcV$sp(ParallelCollectionRDD.scala:74)
    at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1138)
    ... 20 more
Caused by: java.lang.ClassNotFoundException: com.oreilly.learningsparkexamples.proto.Places$Venue
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:191)
    at com.google.protobuf.GeneratedMessageLite$SerializedForm.readResolve(GeneratedMessageLite.java:768)
    ... 37 more

当我检查协议缓冲区类时,Places.Venue实际上是在由类构建的jar中。有人之前遇到过这个问题吗?任何帮助表示赞赏! 没人知道这个问题吗?

2 个答案:

答案 0 :(得分:1)

经过大量的时间搜索,最后我通过添加

解决了这个问题
spark.serializer  org.apache.spark.serializer.KryoSerializer

在spark-defaults.conf文件中

答案 1 :(得分:0)

您的例外

Caused by: java.lang.ClassNotFoundException: com.oreilly.learningsparkexamples.proto.Places$Venue

陈述该课

com.oreilly.learningsparkexamples.proto.Places$Venue
在您的类路径中找不到

您可以将带有此类的jar添加到spark-defaults.conf的spark.executor.extraClassPath选项中,以便在您在rdd.map函数中访问它时将其分发给每个执行程序(在执行程序进程中执行) :

val outputData = data.map{ pb =>
  val protoWritable = ProtobufWritable.newInstance(classOf[Places.Venue]);
  protoWritable.set(pb)
  (null, protoWritable)
}

所有Spark属性和默认值都在此处:Spark Configuration