这个问题困扰了我几天,但是我在火花历史记录日志中找不到详细的原因。
val w2v = Word2VecModel.load("/user/w2v_model")
val vectors = w2v.getVectors.selectExpr("word as hid", "vector")
...
val broadcastBrp = spark.sparkContext.broadcast(brp)
val broadcastVec = spark.sparkContext.broadcast(vectors)
...
val recomRes = sample.rdd.map(row => {
val uid = row.getInt(0)
val vec = row.getAs[Vector](1)
//It seems broadcastVec.value is null. what could I do?
val t = broadcastBrp.value.approxNearestNeighbors(broadcastVec.value, vec, 5)
.select("hid", "distCol").withColumn("uid", lit(uid))
c = t.withColumn("concat", concat_ws("_", t.col("uid"), t.col("hid"), t.col("distCol")))
.select(collect_list("concat").alias("list"))
c.head()
})
我使用sbt assmbly工具将代码编译为jar,并在local [*]上平稳运行,但是在yarn client模式下出现错误: ``
java.lang.NullPointerException
at org.apache.spark.sql.Dataset.schema(Dataset.scala:410)
at org.apache.spark.sql.Dataset.columns(Dataset.scala:461)
at org.apache.spark.ml.feature.LSHModel.approxNearestNeighbors(LSH.scala:112)
at org.apache.spark.ml.feature.LSHModel.approxNearestNeighbors(LSH.scala:180)
at com.xuchuanhua.learning.Test$$anonfun$1.apply(Test.scala:90)
at com.xuchuanhua.learning.Test$$anonfun$1.apply(Test.scala:84)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at scala.collection.AbstractIterator.to(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:936)
at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:936)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
```