如何预测根节点值?

时间:2019-04-28 15:50:35

标签: scikit-learn regression decision-tree

使用sklearn实现DecisionRegressor,并希望根节点拆分值计算

计算出损失类型的方差,但是sklearn用损失类型构建的树的值为0.5,但我的值不同。

输入:

===== Location  LossType    FrontBumbper    RightSide  Duration(Days) 0          0           1           1     10, 0             0           0           1     5, 0          1           1           0     50, 0             0           0   1     20, 1             1           1           1     9, 1          0         1             0     8,

Variance for losstype: loss type = 0: mean value (10+5+20+8)/4 =10.75 variance =[(10-10.75)^2+(5-10.75)^2+(20-10.75)^2]/4=31.6875

loss type = 1: mean value (50+9)/2 = 29.5 variance = [(50-29.5)^2+(9-29.5)^2]]/2=420.5

sum of weighted variance(losstype) = 4/7*(31.6875) + 2/7*(420.5)

但是sklearn树预测如下:这里losstype <=0.5作为计算条件,但我的方差与之不匹配。

tree view

code


import numpy as np
 import pandas as pd
 import matplotlib.pyplot as plt 
 from sklearn.tree import DecisionTreeRegressor

 dataset = pd.read_excel("/home/datascience/Docume /decisiontreeclassifier/Data.xls")
 print (dataset)
 X = dataset.iloc[:, 0:4]
 print (X)
 y = dataset.iloc[:,4]
 print(y)
 regressor = DecisionTreeRegressor(random_state = 0)  

 regressor.fit(X, y) 

 pred_data =[[1,0,0,0]]
 y_pred = regressor.predict(pred_data)
 print (y_pred)

 from sklearn.tree import export_graphviz
 export_graphviz(regressor, out_file='tree.dot', feature_names=    ['Location','LossType','FrontBumbper','RightSide'])

0 个答案:

没有答案