比较Scikit-learn和Spark ML的DecisionTree
模型。以下内容彼此非常匹配。我无法映射的是value
Spark ML DecisionTree
数据的架构:
root
|-- id: integer (nullable = true)
|-- prediction: double (nullable = true)
|-- impurity: double (nullable = true)
|-- impurityStats: array (nullable = true)
| |-- element: double (containsNull = true)
|-- gain: double (nullable = true)
|-- leftChild: integer (nullable = true)
|-- rightChild: integer (nullable = true)
|-- split: struct (nullable = true)
| |-- featureIndex: integer (nullable = true)
| |-- leftCategoriesOrThreshold: array (nullable = true)
| | |-- element: double (containsNull = true)
| |-- numCategories: integer (nullable = true)
sklearn tree.feature --> sparkML root.split.featureIndex
sklearn tree.threshold --> sparkML.root.leftCategoriesOrThreshold
sklearn tree.children_left --> sparkML root.leftChild
sklearn tree.children_right --> sparkML root.rightChild
sklearn tree.value --> sparkML root.?