使用sklearn实现DecisionRegressor,并希望根节点拆分值计算
计算出损失类型的方差,但是sklearn用损失类型构建的树的值为0.5,但我的值不同。
输入:
===== Location LossType FrontBumbper RightSide Duration(Days) 0 0 1 1 10, 0 0 0 1 5, 0 1 1 0 50, 0 0 0 1 20, 1 1 1 1 9, 1 0 1 0 8,
Variance for losstype: loss type = 0: mean value (10+5+20+8)/4 =10.75 variance =[(10-10.75)^2+(5-10.75)^2+(20-10.75)^2]/4=31.6875
loss type = 1: mean value (50+9)/2 = 29.5 variance = [(50-29.5)^2+(9-29.5)^2]]/2=420.5
sum of weighted variance(losstype) = 4/7*(31.6875) + 2/7*(420.5)
但是sklearn树预测如下:这里losstype <=0.5
作为计算条件,但我的方差与之不匹配。
code
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeRegressor
dataset = pd.read_excel("/home/datascience/Docume /decisiontreeclassifier/Data.xls")
print (dataset)
X = dataset.iloc[:, 0:4]
print (X)
y = dataset.iloc[:,4]
print(y)
regressor = DecisionTreeRegressor(random_state = 0)
regressor.fit(X, y)
pred_data =[[1,0,0,0]]
y_pred = regressor.predict(pred_data)
print (y_pred)
from sklearn.tree import export_graphviz
export_graphviz(regressor, out_file='tree.dot', feature_names= ['Location','LossType','FrontBumbper','RightSide'])