pyspark ml数据帧中的预测如何并行化

时间:2019-07-22 21:37:11

标签: machine-learning pyspark decision-tree

我有一个使用ML API在pyspark中构建的随机森林模型。我想访问随机森林模型中的单个树并为每棵树执行预测,并获得树间预测的方差。我如何平行化此预测

from pyspark.ml.classification import RandomForestClassifier
forest = RandomForestClassifier(numTrees=5)
forest = forest.fit(cars_train)
forest_trees=forest.trees

forest_trees

forest_trees[0].tranform(cars_train)

[DecisionTreeClassificationModel (uid=dtc_aa66702a4ce9) of depth 5 with 17 nodes,
DecisionTreeClassificationModel (uid=dtc_99f7efedafe9) of depth 5 with 31 nodes,
DecisionTreeClassificationModel (uid=dtc_9306e4a5fa1d) of depth 5 with 21 nodes,
DecisionTreeClassificationModel (uid=dtc_d643bd48a8dd) of depth 5 with 23 nodes,
DecisionTreeClassificationModel (uid=dtc_a2d5abd67969) of depth 5 with 27 node

我可以对每棵树进行预测。但是要并行执行此操作,并在所有并行预测中获得标准差。怎么做

0 个答案:

没有答案