我尝试了随机森林回归。
下面给出了代码。
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import KFold, cross_val_predict
from sklearn.feature_selection import SelectKBest, f_regression
from sklearn.pipeline import make_pipeline, Pipeline
from sklearn.ensemble import RandomForestRegressor
from sklearn.feature_selection import RFECV
from sklearn.model_selection import GridSearchCV
np.random.seed(0)
d1 = np.random.randint(2, size=(50, 10))
d2 = np.random.randint(3, size=(50, 10))
d3 = np.random.randint(4, size=(50, 10))
Y = np.random.randint(7, size=(50,))
X = np.column_stack([d1, d2, d3])
n_smples, n_feats = X.shape
print (n_smples, n_feats)
kf = KFold(n_splits=5, shuffle=True, random_state=0)
regr = RandomForestRegressor(max_features=None,random_state=0)
pipe = make_pipeline(RFECV(estimator=regr, step=3, cv=kf, scoring =
'neg_mean_squared_error', n_jobs=-1),
GridSearchCV(regr, param_grid={'n_estimators': [100, 300]},
cv=kf, scoring = 'neg_mean_squared_error',
n_jobs=-1))
ypredicts = cross_val_predict(pipe, X, Y, cv=kf, n_jobs=-1)
rmse = mean_squared_error(Y, ypredicts)
print (rmse)
但是,出现以下错误:
sklearn.exceptions.NotFittedError:未安装估算器,请在开发模型之前调用fit
。
我也尝试过:
model = pipe.fit(X,Y)
ypredicts = cross_val_predict(model, X, Y, cv=kf, n_jobs=-1)
但是有同样的错误。
编辑1: 我也尝试过:
pipe.fit(X,Y)
但是有同样的错误。
在Python 2.7(Sklearn 0.20)中,对于相同的代码,我得到了不同的错误:
TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.
在Python 2.7(Sklearn 0.20.3)中:
NotFittedError: Estimator not fitted, call
适合before exploiting the model.
答案 0 :(得分:0)
代替
model = pipe.fit(X,Y)
您尝试过
pipe.fit(X,Y)
相反?
那应该是
pipe.fit(X,Y)
# change model to pipe
ypredicts = cross_val_predict(pipe, X, Y, cv=kf, n_jobs=-1)
答案 1 :(得分:0)
您似乎正在尝试通过使用网格搜索为分类器选择最佳参数,这是另一种选择。您正在使用管道,但是在这种方法中,我没有使用管道,但通过随机搜索获得了最佳参数。
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import KFold, cross_val_predict
from sklearn.feature_selection import SelectKBest, f_regression
from sklearn.pipeline import make_pipeline, Pipeline
from sklearn.ensemble import RandomForestRegressor
from sklearn.feature_selection import RFECV
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
np.random.seed(0)
d1 = np.random.randint(2, size=(50, 10))
d2 = np.random.randint(3, size=(50, 10))
d3 = np.random.randint(4, size=(50, 10))
Y = np.random.randint(7, size=(50,))
X = np.column_stack([d1, d2, d3])
n_smples, n_feats = X.shape
print (n_smples, n_feats)
kf = KFold(n_splits=5, shuffle=True, random_state=0)
regr = RandomForestRegressor(max_features=None,random_state=0)
n_iter_search = 20
random_search = RandomizedSearchCV(regr, param_distributions={'n_estimators': [100, 300]},
n_iter=20, cv=kf,verbose=1,return_train_score=True)
random_search.fit(X, Y)
ypredicts=random_search.predict(X)
rmse = mean_squared_error(Y, ypredicts)
print(rmse)
print(random_search.best_params_)
random_search.cv_results_
尝试这段代码。我希望这段代码能解决您的问题。