使用grpc依赖项时如何加载spark ml模型?

时间:2018-07-18 06:02:50

标签: scala apache-spark dependencies sbt grpc

我对负载火花模型有疑问。 在不使用gRPC依赖项的情况下完全可以正常工作。

这是我的main.scala

import org.apache.spark.ml.tuning.CrossValidatorModel
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.Row

    object lr_model {
    val spark = SparkSession.builder().appName("ml model").master("local[*]").getOrCreate();

    val model = CrossValidatorModel.load("C:/Users/.....................")
    def testing(subject:String):String = {
        val datatest = spark.createDataFrame(Seq(("CATEGORY_SOCIAL",subject))).toDF("labelss","subjects")
        var result = model.transform(datatest).head().getDouble(6)
        return result 
    }

    def main(args: Array[String]): Unit = {
        println(testing("aaaa"))    
        spark.stop()
    }
}

这是我的build.sbt

scalaVersion := "2.11.7"

PB.targets in Compile := Seq(
  scalapb.gen() -> (sourceManaged in Compile).value
)

val scalapbVersion =
    scalapb.compiler.Version.scalapbVersion
val grpcJavaVersion =
    scalapb.compiler.Version.grpcJavaVersion

libraryDependencies ++= Seq(
    // spark
    "org.apache.spark" %% "spark-core" % "2.3.1" ,
    "org.apache.spark" %% "spark-sql" % "2.3.1" ,
    "org.apache.spark" %% "spark-mllib" % "2.3.1" ,

    // protobuf
    "com.thesamet.scalapb" %% "scalapb-runtime" % scalapbVersion % "protobuf"

    //for grpc
    //"io.grpc" % "grpc-netty" % grpcJavaVersion
    //"com.thesamet.scalapb" %% "scalapb-runtime-grpc" % scalapbVersion
)

但是当我使用gprc依赖性时,会给我这样的错误。

18/07/18 12:59:08 INFO SparkContext: Created broadcast 0 from textFile at ReadWrite.scala:387
[error] (run-main-0) java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapred.FileInputFormat
java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapred.FileInputFormat
        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:312)
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)

它来自grpc的错误或我的代码是错误的? 谢谢

1 个答案:

答案 0 :(得分:0)

Spark似乎正在使用Guava 14.0.1。 gRPC-Java需要Guava 20或更高版本。 Spark调用的方法是一个@Beta API(永远不要从库中使用过)API,该API在2013年9月发布的Guava 15.0中已弃用,而在2014年4月发布的Guava 17.0中已删除。需要针对Spark提出问题,以更新其代码以使用/支持较新版本的Guava。