将toDebugString Pyspark模型输出转换为树视图

时间:2016-06-07 14:43:42

标签: python visualization pyspark graphviz

我从Pyspark生成了一个DecisionTree模型并得到了如下输出:

print model._call_java('toDebugString')

  If (feature 26 <= 12.0)
   If (feature 40 <= 0.0)
    If (feature 16 <= 0.0)
     Predict: 0.0
    Else (feature 16 > 0.0)
     Predict: 1.0
   Else (feature 40 > 0.0)
    If (feature 39 <= 7.0)
     Predict: 1.0
    Else (feature 39 > 7.0)
     Predict: 0.0
  Else (feature 26 > 12.0)
   If (feature 40 <= 0.0)
    If (feature 25 <= 96.0)
     Predict: 0.0
    Else (feature 25 > 96.0)
     Predict: 0.0
   Else (feature 40 > 0.0)
    If (feature 28 <= 110.0)
     Predict: 0.0
    Else (feature 28 > 110.0)
     Predict: 0.0

我已经以非常麻烦的方式解析了feature 28与要素名称之间的关系:

def isint(s):
    try: 
        int(s)
        return True
    except ValueError:
        return False
dd = {}
for i, col in enumerate(assembler.getInputCols()): dd.update({i:col.replace(' as bigint','')})

mytree = model.stages[2]._call_java('toDebugString')\
.replace('feature','')\
.replace('If (', '')\
.replace('Else (', '')\
.replace('Predict: 1.0', 'match')\
.replace('Predict: 0.0', 'no match')\
.replace(')','')

ff = []
for split in mytree.split(' '):
    if isint(split):
        ff.append(split)
feature_clean = list(set(ff[2:]))

mt = mytree.split('\n')
mt.pop(0)
mt = '\n'.join([t for t in mt])

for i in feature_clean:
    mt = mt.replace(' '+str(i),dd[int(i)])
print mt.replace('     ','')

所以至少我有相同的结构,上面有功能名称。我想生成像this one这样的树形图。如果没有更可怕的解析代码,这可能吗?

0 个答案:

没有答案