org.apache.spark.SparkException:由于阶段失败导致作业中止:阶段2.0中的任务0失败4次,最近失败:阶段2.0中丢失任务0.3(TID 5,svr17933hw2288.hadoop.sh.ctripcorp.com ,executor 1):org.apache.spark.SparkException:无法执行用户定义的函数($ anonfun $ createTransformFunc $ 1:(string)=> array)
代码如下:
val tokenizer = new Tokenizer().setInputCol("sendcontent").setOutputCol("words")
var wordsData = tokenizer.transform(sourDF)
val hashingTF = new HashingTF()
.setInputCol("words").setOutputCol("rawFeatures").setNumFeatures(20)
val featurizedData = hashingTF.transform(wordsData)
val idf = new IDF().setInputCol("rawFeatures").setOutputCol("features")
val idfModel = idf.fit(featurizedData)
val rescaledData = idfModel.transform(featurizedData)
rescaledData.select("features", "msgid").take(3).foreach(println)