我从Pyspark生成了一个DecisionTree
模型并得到了如下输出:
print model._call_java('toDebugString')
If (feature 26 <= 12.0)
If (feature 40 <= 0.0)
If (feature 16 <= 0.0)
Predict: 0.0
Else (feature 16 > 0.0)
Predict: 1.0
Else (feature 40 > 0.0)
If (feature 39 <= 7.0)
Predict: 1.0
Else (feature 39 > 7.0)
Predict: 0.0
Else (feature 26 > 12.0)
If (feature 40 <= 0.0)
If (feature 25 <= 96.0)
Predict: 0.0
Else (feature 25 > 96.0)
Predict: 0.0
Else (feature 40 > 0.0)
If (feature 28 <= 110.0)
Predict: 0.0
Else (feature 28 > 110.0)
Predict: 0.0
我已经以非常麻烦的方式解析了feature 28
与要素名称之间的关系:
def isint(s):
try:
int(s)
return True
except ValueError:
return False
dd = {}
for i, col in enumerate(assembler.getInputCols()): dd.update({i:col.replace(' as bigint','')})
mytree = model.stages[2]._call_java('toDebugString')\
.replace('feature','')\
.replace('If (', '')\
.replace('Else (', '')\
.replace('Predict: 1.0', 'match')\
.replace('Predict: 0.0', 'no match')\
.replace(')','')
ff = []
for split in mytree.split(' '):
if isint(split):
ff.append(split)
feature_clean = list(set(ff[2:]))
mt = mytree.split('\n')
mt.pop(0)
mt = '\n'.join([t for t in mt])
for i in feature_clean:
mt = mt.replace(' '+str(i),dd[int(i)])
print mt.replace(' ','')
所以至少我有相同的结构,上面有功能名称。我想生成像this one这样的树形图。如果没有更可怕的解析代码,这可能吗?