Question

这基本上与以下问题相同：

Equivalent of mllib.DecisionTreeModel.toDebugString() in ml.DecisionTreeClassificationModel

但对于pyspark。

我曾经能够做类似的事情：

from pyspark.mllib.tree import DecisionTree
model = DecisionTree.trainClassifier(trainingData, numClasses=2, categoricalFeaturesInfo=categoricalFeatures, impurity='gini', maxDepth=5, maxBins=16)
print model.toDebugString()

我会对决策树有一个很好的可视化：

DecisionTreeModel classifier of depth 5 with 49 nodes
  If (feature 1 in {0.0})
   If (feature 0 in {0.0})
    If (feature 2 <= 52.0)
     If (feature 3 <= 26.0)
      Predict: 0.0
...

我正在尝试将代码移植到pyspark.ml，但是我没有看到打印生成的树的任何方法

from pyspark.ml.classification import DecisionTreeClassifier
dt = DecisionTreeClassifier(labelCol="indexedLabel", featuresCol="indexedFeatures", maxDepth=5, maxBins=16, impurity='gini')
model = dt.fit(transformedTrainingData)

当我这样做时：

print model

我只得到第一行：

DecisionTreeClassificationModel (uid=DecisionTreeClassifier_4cbda3dcd0bddd9d4a0b) of depth 5 with 43 nodes

关于如何获得漂亮的树输出的想法？

Answer 1

我找到了解决方案。它不优雅，它违反了封装和你所学到的关于面向对象编程的一切，但它的工作原理：

print model._call_java("toDebugString")

DecisionTreeClassificationModel (uid=DecisionTreeClassifier_4c3bb548827f07c590e6) of depth 5 with 49 nodes
  If (feature 1 in {0.0})
   If (feature 0 in {1.0,2.0})
    If (feature 2 <= 5.0)
     If (feature 3 <= 26.0)
      Predict: 1.0
     Else (feature 3 > 26.0)
      If (feature 0 in {2.0})
...

Answer 2

现在（在Spark 2.2中），您还可以简单地调用：

print(model.toDebugString)

您将得到类似的东西：

DecisionTreeClassificationModel (uid=DecisionTreeClassifier_48b398caca43f9fd5bc1) of depth 15 with 5237 nodes
  If (feature 39 <= 0.09)
   If (feature 11 <= 369.79999999999995)
    If (feature 33 <= 217.75400000000002)
     If (feature 4 <= 3864.0)
      If (feature 33 <= -0.01)
       If (feature 12 <= 2950.0)
        If (feature 33 <= -64.83)

pyspark.ml.classification.DecisionTreeClassificationModel中的pyspark.mllib.tree.DecisionTreeModel.toDebugString（）的等价物 - IN PYTHON

2 个答案: