我对负载火花模型有疑问。 在不使用gRPC依赖项的情况下完全可以正常工作。
这是我的main.scala
import org.apache.spark.ml.tuning.CrossValidatorModel
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.Row
object lr_model {
val spark = SparkSession.builder().appName("ml model").master("local[*]").getOrCreate();
val model = CrossValidatorModel.load("C:/Users/.....................")
def testing(subject:String):String = {
val datatest = spark.createDataFrame(Seq(("CATEGORY_SOCIAL",subject))).toDF("labelss","subjects")
var result = model.transform(datatest).head().getDouble(6)
return result
}
def main(args: Array[String]): Unit = {
println(testing("aaaa"))
spark.stop()
}
}
这是我的build.sbt
scalaVersion := "2.11.7"
PB.targets in Compile := Seq(
scalapb.gen() -> (sourceManaged in Compile).value
)
val scalapbVersion =
scalapb.compiler.Version.scalapbVersion
val grpcJavaVersion =
scalapb.compiler.Version.grpcJavaVersion
libraryDependencies ++= Seq(
// spark
"org.apache.spark" %% "spark-core" % "2.3.1" ,
"org.apache.spark" %% "spark-sql" % "2.3.1" ,
"org.apache.spark" %% "spark-mllib" % "2.3.1" ,
// protobuf
"com.thesamet.scalapb" %% "scalapb-runtime" % scalapbVersion % "protobuf"
//for grpc
//"io.grpc" % "grpc-netty" % grpcJavaVersion
//"com.thesamet.scalapb" %% "scalapb-runtime-grpc" % scalapbVersion
)
但是当我使用gprc依赖性时,会给我这样的错误。
18/07/18 12:59:08 INFO SparkContext: Created broadcast 0 from textFile at ReadWrite.scala:387
[error] (run-main-0) java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapred.FileInputFormat
java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapred.FileInputFormat
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:312)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
它来自grpc的错误或我的代码是错误的? 谢谢
答案 0 :(得分:0)
Spark似乎正在使用Guava 14.0.1。 gRPC-Java需要Guava 20或更高版本。 Spark调用的方法是一个@Beta
API(永远不要从库中使用过)API,该API在2013年9月发布的Guava 15.0中已弃用,而在2014年4月发布的Guava 17.0中已删除。需要针对Spark提出问题,以更新其代码以使用/支持较新版本的Guava。