原因:java.lang.ClassCastException:无法将scala.collection.immutable.List $ SerializationProxy实例分配给字段org.apache.spark.rdd.RDD.org $ apache $ spark $ rdd $ RDD $$ dependencies_在org.apache.spark.rdd.MapPartitionsRDD实例中键入scala.collection.Seq 在java.io.ObjectStreamClass $ FieldReflector.setObjFieldValues(ObjectStreamClass.java:2287)
答案 0 :(得分:0)
运行此代码时,我有相同的错误消息:
import org.apache.spark.ml.feature.{CountVectorizer, CountVectorizerModel}
import org.apache.spark.ml.clustering.LDA
import org.apache.spark.sql.functions.udf
import scala.collection.mutable.WrappedArray
val txt = Array("A B B C", "A B D D", "A C D")
val txtDf = spark.sparkContext.parallelize(txt).toDF("txt")
val txtDfSplit = txtDf.withColumn("txt", split(col("txt"), " "))
// val txtDfSplit = df.withColumn("txt", split(col("txt"), " "))
// create sparse vector with the number
// of occurrences of each word using CountVectorizer
val cvModel = new CountVectorizer().setInputCol("txt").setOutputCol("features").setVocabSize(4).setMinDF(2).fit(txtDfSplit)
val txtDfTrain = cvModel.transform(txtDfSplit)
txtDfTrain.show(false)
产生此错误:
org.apache.spark.SparkException:由于阶段失败,作业被中止: 1.0阶段中的任务9失败4次,最近一次失败:任务9.3丢失 在阶段1.0(TID 25,somehostname.domain,执行者1)中: java.lang.ClassCastException:无法分配的实例 scala.collection.immutable.List $ SerializationProxy到字段 org.apache.spark.rdd.RDD.org $ apache $ spark $ rdd $ RDD $$ dependencies_ of 在以下实例中键入scala.collection.Seq org.apache.spark.rdd.MapPartitionsRDD
我一直在浏览描述此错误的各个页面,这似乎是某种版本冲突。该代码可在IntelliJ中运行(独立运行)。将应用提交到Spark时出现错误。