执行集成方法的最佳实践

时间:2019-07-19 21:56:06

标签: python python-3.x scikit-learn ensemble-learning

我正在运行下面的示例代码。

df = pd.read_csv('C:\\my_path\\test.csv', header=0, encoding = 'unicode_escape')
df = df.fillna(0)

X = df1.drop(columns = ['PRICE','MATURITYDATE'])
y = df1['PRICE']

from sklearn.model_selection import train_test_split
#split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

import numpy as np
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
#create new a knn model
knn = KNeighborsClassifier()
#create a dictionary of all values we want to test for n_neighbors
params_knn = {'n_neighbors': np.arange(1, 25)}
#use gridsearch to test all values for n_neighbors
knn_gs = GridSearchCV(knn, params_knn, cv=5)
#fit model to training data
knn_gs.fit(X_train, y_train)

#save best model
knn_best = knn_gs.best_estimator_
#check best n_neigbors value
print(knn_gs.best_params_)



# RANDOM FOREST
from sklearn.ensemble import RandomForestClassifier
#create a new random forest classifier
rf = RandomForestClassifier()
#create a dictionary of all values we want to test for n_estimators
params_rf = {'n_estimators': [50, 100, 200]}
#use gridsearch to test all values for n_estimators
rf_gs = GridSearchCV(rf, params_rf, cv=5)
#fit model to training data
rf_gs.fit(X_train, y_train)

#save best model
rf_best = rf_gs.best_estimator_
#check best n_estimators value
print(rf_gs.best_params_)


# REGRESSION
from sklearn.linear_model import LogisticRegression
#create a new logistic regression model
log_reg = LogisticRegression()
#fit the model to the training data
log_reg.fit(X_train, y_train)

#test the three models with the test data and print their accuracy scores
print('knn: {}'.format(knn_best.score(X_test, y_test)))
print('rf: {}'.format(rf_best.score(X_test, y_test)))
print('log_reg: {}'.format(log_reg.score(X_test, y_test)))


# VOTING CLASSIFIER
from sklearn.ensemble import VotingClassifier
#create a dictionary of our models
estimators=[('knn', knn_best), ('rf', rf_best), ('log_reg', log_reg)]
#create our voting classifier, inputting our models
ensemble = VotingClassifier(estimators, voting='hard')

全部来自下面的链接

https://towardsdatascience.com/ensemble-learning-using-scikit-learn-85c4531ff86a

我遇到的每种方法的问题始终是:

ValueError: Unknown label type: 'continuous'

我想一切都需要转换为分类类型,或者也许需要应用一种热编码。它是否正确?解决此类问题的最佳方法是什么?我希望在不引入自定义编码的情况下保持简单和通用。这就是为什么我倾向于scikit-learn库的原因。我将不胜感激任何/所有的想法和见解。非常感谢!

0 个答案:

没有答案