我正在尝试使用sklearn python中的DecisionTreeRegressor来找出两个变量X轴预先确定和y轴接收光功率之间的依赖关系。我正在测量两个参数,如* / 1min。
当我在matlab中使用polyval和polyfit时,我能够提取 实际的预测方程式或多或少地描述了
之间的关系received_optical_power = f(preassure)
我认为我的问题基本上是在使用DecisionTreeRegressor时如何评估分析的输出。我指的是实际的方程式,残差以及如何计算提取曲线的实际误差。
我在localhost上使用jupyter notebooks python进行我的项目,因为我的输入 final.merged.txt 文件有10 MB。
print(__doc__)
# Import the necessary modules and libraries
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeRegressor
import matplotlib.pyplot as plt
# Create a random dataset
rng = np.random.RandomState(1)
# X = np.sort(5 * rng.rand(80, 1), axis=0)
# y = np.sin(X).ravel()
# y[::5] += 3 * (0.5 - rng.rand(16))
X = pd.read_csv('final.merged.txt',sep = ";",usecols=[3]) # -- toto funguje !!
#X = pd.read_csv('final.merged.txt',sep = ";",usecols=(3,5,6,7,9,10))
#X = X.loc[:,'avgPressure'].values
y = pd.read_csv('final.merged.txt',sep = ";",usecols=[1])
# y = y.ix[:,0]
y = y.loc[:,'received optical power'].values
# Fit regression model
regr_1 = DecisionTreeRegressor(max_depth=2)
regr_2 = DecisionTreeRegressor(max_depth=5)
regr_3 = DecisionTreeRegressor(max_depth=10)
regr_1.fit(X, y)
regr_2.fit(X, y)
regr_3.fit(X, y)
# Predict
#X_test = np.arange(0.0, 5.0, 0.01)[:, np.newaxis]
#X_test =xrange(X.min(),X.max())
X_test = np.arange(X.min(), X.max(), (X.max()-X.min())/len(X))
print "vector y: %s\nvector X: %s\nX_test: %s" % (len(y), len(X),len(X_test))
len(X_test) is len(y)
X_test=pd.DataFrame(X_test,columns = ['X_test'])
y_1 = regr_1.predict(X_test)
y_2 = regr_2.predict(X_test)
y_3 = regr_3.predict(X_test)
# Plot the results
plt.figure()
plt.scatter(X, y, c="darkorange", label="data")
plt.plot(X_test, y_1, color="cornflowerblue", label="max_depth=2", linewidth=2)
plt.plot(X_test, y_2, color="yellowgreen", label="max_depth=5", linewidth=2)
plt.plot(X_test, y_3, color="red", label="max_depth=10", linewidth=2)
plt.xlabel("data")
plt.ylabel("target")
plt.title("Decision Tree Regression")
plt.legend()
plt.show()
我非常感谢任何建议。