Question

我正在尝试测量MLlib分类算法的训练和预测时间。

我现在对11000万条记录运行我的代码，并且预测时间与仅1000条记录（〜20ms）相同。转换方法可以在某些惰性模式下工作吗？

我使用的代码：

BenchmarkUtil.startTime()
val trainModel = pipeline.fit(trainingData)
val trainTime = BenchmarkUtil.getProcessingTime()
println(className + " Train time [ms]: " + trainTime)

// Make predictions.
BenchmarkUtil.startTime()
val predictions = trainModel.transform(testData)
val testTime = BenchmarkUtil.getProcessingTime()
println(className + " Prediction time [ms]: " + testTime)

11000000条记录的样本输出-拆分了80％的训练数据，20％的测试数据：

RandomForrestClassifierAlgorithm$ Train time [ms]: 2547637
RandomForrestClassifierAlgorithm$ Prediction time [ms]: 20

Answer 1

原来，我必须对转换后的数据执行操作才能进行转换。

当我收集转换后的数据时，它可以正常工作。更改后的代码：

actual_printf()

测量Spark mllib分类算法的预测时间

1 个答案: