如何保存Spark mllib KMeans模型? model.save导致空目录

时间:2019-01-31 09:14:39

标签: scala apache-spark spark-streaming k-means apache-spark-mllib

我有一个流源,我想首先导出聚类K-Means模型。稍后,我计划将此模型加载到StreamingKMeans。这是我走了多远,但保存操作导致空文件夹/ data。

srtLabeledPoints.foreachRDD{ rdd => {
   import sparkSession.implicits._   

   val testTrain = rdd.randomSplit(Array(0.3, 0.7))      
   val test = testTrain(0)
   val train = testTrain(1)

   val model = kMeans.run(train.map(f => f.features))  
   val a = model.predict(test.map(f => f.features))
   println("******************")       
   a.take(50).foreach(println)
   val b = model.save(sparkSession.sparkContext, "/mnt/c/Users/ssss/ml/oooModel3" + a.id)
   println("******************")       
   }
   rdd.unpersist()
 }

以下是kmeans

val kMeans   = new org.apache.spark.mllib.clustering.KMeans()
  .setK(20) //# of clusters
  .setSeed(31)

我看到控制台上显示的分类(介于0到19之间)(带有Take操作)。但是/ data为空,/ metadata有一个1Kb的文件,内容如下:

{"class":"org.apache.spark.mllib.clustering.KMeansModel","version":"2.0","k":20,"distanceMeasure":"euclidean","trainingCost":5.147088938203919E11}

为了保存模型,我在做什么错了?

0 个答案:

没有答案