Ho可以通过Spark LinearSVC模型获得最佳功能吗?

时间:2018-01-07 09:41:56

标签: apache-spark machine-learning feature-selection

我正在尝试使用ChiSqSelector来确定Spark 2.2 LSVCModel的最佳功能,因此:

import org.apache.spark.ml.feature.ChiSqSelector
val chiSelector = new ChiSqSelector().setNumTopFeatures(5).
   setFeaturesCol("features").
   setLabelCol("label").setOutputCol("selectedFeatures")

val pipeline = new Pipeline().setStages(Array(labelIndexer, monthIndexer, hashingTF
   , idf, va, featureIndexer,  chiSelector, lsvc, labelConverter))

val model = pipeline.fit(training)
val importantFeatures = model.selectedFeatures

import org.apache.spark.ml.classification.LinearSVCModel
val LSVCModel= model.stages(6).asInstanceOf[org.apache.spark.ml.classification.
   LinearSVCModel]

val importantFeatures = LSVCModel.selectedFeatures

给出错误:

<console>:180: error: value selectedFeatures is not a member of 
org.apache.spark.ml.classification.LinearSVCModel
   val importantFeatures = LSVCModel.selectedFeatures

是否可以在此型号中使用ChiSqSelector?如果没有,还有其他选择吗?

1 个答案:

答案 0 :(得分:0)

线性SVC不会进行任何功能选择。您应该从管道中提取ChiSqSelectorModel,而不是LinearSVCModel

import org.apache.spark.ml.feature.ChiSqSelectorModel
val chiSqModel = model.stages(6).asInstanceOf[ChiSqSelectorModel]

val importantFeatures = chiSqModel.selectedFeatures