H2O DistributedRandomForest所有树预测

时间:2019-02-26 14:25:49

标签: python tree random-forest h2o

我使用Python的H2O(版本3.22.1.3),我想知道是否可以像在scikit-learn的RandomForestRegressor.estimators_方法中一样在Random Forest中观察每棵树的预测。我尝试使用h2o.predict_leaf_node_assignment(),但是它带来了每棵树的预测路径或(据推测)基于进行预测的叶节点的ID。在最后一个版本中,H2O添加了Tree类,但不幸的是,它没有任何预报()方法。尽管我可以访问任意森林的任何树中的任何节点,但是我使用树的最近实现的API(即使正确)对树预测功能的实现仍然非常慢。所以,我的问题是:

(a)我可以本地获取树预测吗?如果可以,那么如何?

(b)如果否,H2O开发人员是否计划在将来的版本中实现此功能?

任何回应将不胜感激。

更新:谢谢乔,您的答复。就目前而言(在直接实现该功能之前),这是我能想到的唯一生成树预测的解决方法。

# Suppose we have random forest model called drf with ntrees=70 and want to make predictions on df_valid
# After executing the code below, we get a dataframe tree_predictions with ntrees (in our case 70) columns, where i-th column corresponds to the predictions of i-th tree, and the same number of rows as df_valid.
# Extract the trees to create prediction intervals
# Number of trees
ntrees = 70

from h2o.tree import H2OTree
# Extract all the tree of drf, create the list of prediction trees
list_of_trees = [H2OTree(model = drf, tree_number = t, tree_class = None) for t in range(ntrees)]

# leaf_nodes contains the node_id's of tree leaves with predictions
leaf_nodes = drf.predict_leaf_node_assignment(df_valid, type='Node_ID').as_data_frame()

# tree_predictions is the dataframe with predictions for all the 70 trees
tree_predictions = pd.DataFrame(columns=['T'+str(t+1) for t in range(ntrees)])
for t in range(ntrees):
    tr = list_of_trees[t]
    node_ids = np.array(tr.node_ids)
    treePred = lambda n: tr.predictions[np.where(node_ids==n)[0][0]] 
    tree_predictions['T'+str(t+1)] = leaf_nodes['T'+str(t+1)].apply(treePred)enter code here

1 个答案:

答案 0 :(得分:0)

现在答案是否定的。我们已经创建了在Tree API中实现新功能的问题。您可以在此处跟踪进度:https://0xdata.atlassian.net/browse/PUBDEV-6322