我有一个包含ID,功能的aparquet文件。我想评估pca + knn的指标。
val rawDataset = MLUtils.loadLibSVMFile(sc, "data/mnist/mnist.bz2")
.toDF()
val dataset = MLUtils.convertVectorColumnsToML(rawDataset)
val Array(train, test) = dataset
.randomSplit(Array(0.7, 0.3), seed = 1234L)
.map(_.cache())
val pca = new PCA()
.setInputCol("features")
.setK(50)
.setOutputCol("pcaFeatures")
val knn = new KNNClassifier()
.setTopTreeSize(dataset.count().toInt / 5)
.setFeaturesCol("pcaFeatures")
.setPredictionCol("predicted")
.setK(1)
val pipeline = new Pipeline()
.setStages(Array(pca, knn))
.fit(train)
关于计算召回率和精确度的任何建议吗?