我有一个流源,我想首先导出聚类K-Means模型。稍后,我计划将此模型加载到StreamingKMeans。这是我走了多远,但保存操作导致空文件夹/ data。
srtLabeledPoints.foreachRDD{ rdd => {
import sparkSession.implicits._
val testTrain = rdd.randomSplit(Array(0.3, 0.7))
val test = testTrain(0)
val train = testTrain(1)
val model = kMeans.run(train.map(f => f.features))
val a = model.predict(test.map(f => f.features))
println("******************")
a.take(50).foreach(println)
val b = model.save(sparkSession.sparkContext, "/mnt/c/Users/ssss/ml/oooModel3" + a.id)
println("******************")
}
rdd.unpersist()
}
以下是kmeans
val kMeans = new org.apache.spark.mllib.clustering.KMeans()
.setK(20) //# of clusters
.setSeed(31)
我看到控制台上显示的分类(介于0到19之间)(带有Take操作)。但是/ data为空,/ metadata有一个1Kb的文件,内容如下:
{"class":"org.apache.spark.mllib.clustering.KMeansModel","version":"2.0","k":20,"distanceMeasure":"euclidean","trainingCost":5.147088938203919E11}
为了保存模型,我在做什么错了?