Question

我正在寻找一种使用Spark ML管道预测单个元组/行的方法。我将管道与上一份工作放在一起，并通过保存导出了模型。该管道包含一个随机森林分类模型和一些预处理（字符串索引器和向量索引器）。

现在，我想在事件驱动的设置中使用管道。创建用于预测的数据集是不可行的。我尝试提取随机森林并直接使用model.predict(vector)进行预测。但是，由于预处理不起作用。

我为完整的管道模型寻找了类似的单行/向量函数，但找不到任何函数。可以从单行创建数据帧。可以理解，这是超级低效的（请参见下面的代码）。

问题1：还有其他使用管道模型预测单个数据项的方法吗？
问题2：如果没有，是否可以更有效地创建数据帧？

谢谢！

val pipelineModel = PipelineModel.load("target/pipeline.model")
val data = spark.read().format("libsvm").load("/opt/spark-2.3.2/data/mllib/sample_libsvm_data.txt")
val collected = data.collect() as Array<Row>

val schema = data.schema()
val mutableListOfOneRow = mutableListOf<Row>()

collected.map {
    mutableListOfOneRow.add(it)
    val label = it[0] as Double

    val df = spark.createDataFrame(mutableListOfOneRow, schema)
    val result = pipelineModel.transform(df).collect() as Array<Row>
    val firstRow = result[0]
    println("label $label vs prediction ${firstRow[7]}")

    if (!label.toString().equals(firstRow[7])) {
        errorCount++
    }
    counter++
    mutableListOfOneRow.clear()
}

在单行上使用Spark ML管道预测

0 个答案: