将Seq [(String,Any)]转换为Spark中的Seq [(String,org.apache.spark.ml.PredictionModel [_,_])]

时间:2018-04-02 06:17:44

标签: apache-spark spark-dataframe apache-spark-mllib apache-spark-ml

我已将我的数据集训练到不同的模型中,例如nbModel,dtModel,rfModel,GbmModel。所有这些都是机器学习模型

现在当我将其保存为变量

val models = Seq(("NB", nbModel), ("DT", dtModel), ("RF", rfModel), ("GBM",gbmModel))

我得到一个Seq [(String,Any)]

models: Seq[(String, Any)] = List((NB,NaiveBayesModel (uid=nb_c35f79982850) with 2 classes), (DT,()), (RF,RandomForestClassificationModel (uid=rfc_3f42daf4ea14) with 15 trees), (GBM,GBTClassificationModel (uid=gbtc_534a972357fa) with 20 trees))

如果是单个模型,例如nbModel

 val models = ("NB", nbModel)

输出:models: (String, org.apache.spark.ml.classification.NaiveBayesModel) = (NB,NaiveBayesModel (uid=nb_c35f79982850) with 2 classes)

当我尝试合并这些模型中的几列时,我遇到类型不匹配错误

val mlTrainData= mlData(transferData, "value", models).drop("row_id")

<console>:75: error: type mismatch; found : Seq[(String, Any)] required: Seq[(String, org.apache.spark.ml.PredictionModel[_, _])] val mlTrainData= mlData(transferData, "value", models).drop("row_id")

我的MlDATA也是

def mlData(inputData: DataFrame, responseColumn: String, baseModels:
 | Seq[(String, PredictionModel[_, _])]): DataFrame= {
 | baseModels.map{ case(name, model) =>
 | model.transform(inputData)
 | .select("row_id", model.getPredictionCol )
 | .withColumnRenamed("prediction", s"${name}_prediction")
 | }.reduceLeft((a, b) =>a.join(b, Seq("row_id"), "inner"))
 | .join(inputData.select("row_id", responseColumn), Seq("row_id"),
 | "inner")
 | }

输出:mlData: (inputData: org.apache.spark.sql.DataFrame, responseColumn: String, baseModels: Seq[(String, org.apache.spark.ml.PredictionModel[_, _])])org.apache.spark.sql.DataFrame

1 个答案:

答案 0 :(得分:0)

请你替换代码

val models = Seq(("NB", nbModel), ("DT", dtModel), ("RF", rfModel), ("GBM",gbmModel))

通过

val models = Seq(("NB", nbModel), ("DT", null : org.apache.spark.mllib.tree.model.DecisionTreeModel), ("RF", rfModel), ("GBM",gbmModel))

我想说的是,您的 dtModel 被指定为(),其类型为单位。因此整个数据集的类型成为DecisionTreeModel和Unit的超类, Any 。你需要确保dtModel是DecisionTreeModel类型,如果你已经处理了null情况,那么它是空的。一个空的DecisionTreeModel也可以工作。