scala - 对分类字段进行一致的索引和分类

假设我有以下Scala代码：

import org.apache.spark.ml.feature.StringIndexer

val df = spark.createDataFrame(Seq(
  (0, "a"),
  (1, "b"),
  (2, "c"),
  (3, "a"),
  (4, "a"),
  (5, "c")
)).toDF("id", "category")

val indexer = new StringIndexer()
  .setInputCol("category")
  .setOutputCol("categoryIndex")
  .fit(df)
val indexed = indexer.transform(df)

现在，假设我创建了一个使用此索引器的org.apache.spark.mllib.tree.model.DecisionTreeModel并将模型保存到文件中。

如果我将来对新数据做出预测，索引器将与原始数据上使用的原始索引器一致以构建模型，我该如何确保？

对分类字段进行一致的索引和分类

1 个答案: