使用GridSearchCV时出错,但不是没有GridSearchCV时出错-Python 3.6.7

时间:2019-08-13 15:43:20

标签: python python-3.x scikit-learn mlp

我遇到一个奇怪的错误,在使用GridSearchCV时,我的代码失败了,但仅运行sklearnMLPRegressor时却没有。

以下代码:

from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPRegressor
from sklearn import preprocessing
import pandas as pd
import numpy as np

def str_to_num(arr):
    le = preprocessing.LabelEncoder()
    new_arr = le.fit_transform(arr)
    return new_arr

def compare_values(arr1, arr2):
    thediff = 0
    thediffs = []
    for thing1, thing2 in zip(arr1, arr2):
        thediff = abs(thing1 - thing2)
        thediffs.append(thediff)

    return thediffs

def print_to_file(filepath, arr):
    with open(filepath, 'w') as f:
        for item in arr:
            f.write("%s\n" % item)

data = pd.read_csv('data2.csv')

# create the labels, or field we are trying to estimate
label = data['TOTAL']
# remove the header
label = label[1:]

# create the data, or the data that is to be estimated
data = data.drop('TOTAL', axis=1)
data = data.drop('SERIALNUM', axis=1)
# remove the header
data = data[1:]

# # split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data, label, test_size = 0.2)

mlp = MLPRegressor(activation = 'relu', solver = 'lbfgs', verbose=False)
mlp.fit(X_train, y_train)
mlp_predictions = mlp.predict(X_test)
mlp_differences = compare_values(y_test, mlp_predictions)
mlp_Avg = np.average(mlp_differences)
print(mlp_Avg)

打印以下内容:

  

32.92041129078561   (是的,我知道平均错误很严重)

但是,当尝试优化参数时,相同的参数设置会产生错误:

from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.neural_network import MLPRegressor
from sklearn import preprocessing
import pandas as pd
import numpy as np


def str_to_num(arr):
    le = preprocessing.LabelEncoder()
    new_arr = le.fit_transform(arr)
    return new_arr

def compare_values(arr1, arr2):
    thediff = 0
    thediffs = []
    for thing1, thing2 in zip(arr1, arr2):
        thediff = abs(thing1 - thing2)
        thediffs.append(thediff)

    return thediffs

def print_to_file(filepath, arr):
    with open(filepath, 'w') as f:
        for item in arr:
            f.write("%s\n" % item)

data = pd.read_csv('data2.csv')

# create the labels, or field we are trying to estimate
label = data['TOTAL_DAYS_TO_COMPLETE']
# remove the header
label = label[1:]

# create the data, or the data that is to be estimated
data = data.drop('TOTAL_DAYS_TO_COMPLETE', axis=1)
data = data.drop('SERIALNUM', axis=1)
# remove the header
data = data[1:]

# # split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data, label, test_size = 0.2)

param_grid = {
    #'hidden_layer_sizes': [(1,),(2,),(3,),(10,),(15,),(20,),(25,)],
    'activation': ['identity', 'logistic', 'relu'],
    #'activation': ['relu'],
    'solver': ['lbfgs', 'sgd', 'adam'],
    #'solver': ['adam']
    #'alpha': [0.0001, 0.0005, 0.0009],
    #'learning_rate': ['constant', 'invscaling', 'adaptive'],
    #'learning_rate_init': [0.001, 0.01, 0.99],
    #'warm_start': [True, False]
    #'momentum': [0.1, 0.9, 0.99]
    # Did not solver-specifics...yet
}# Create a based model

mlp = MLPRegressor()# Instantiate the grid search model
grid_search = GridSearchCV(estimator = mlp, param_grid = param_grid, 
                          cv = 3, n_jobs = -1, verbose = 2)
grid_search.fit(X_train, y_train)
print()
print(grid_search.best_params_)
print(grid_search.best_score_)
print()
print("Grid scores on development set: ")
print()
answers = grid_search.predict(X_test)
results = compare_values(answers, y_test)
print("Accuracy: ", np.average(results))
print()

满足以下条件:

  

对9名候选人各进行3折,合计27套   [Parallel(n_jobs = -1)]:使用后端LokyBackend并发8个   工人。 [CV]激活=身份,求解器= lbfgs   ............................... [CV]激活=身份,求解器= lbfgs   ............................... [CV]激活=身份,求解器= sgd   ....................................   C:\ Python367-64 \ lib \ site-packages \ sklearn \ neural_network_base.py:195:   RuntimeWarning:平方返回((y_true-   y_pred)** 2).mean()/ 2 [CV]激活=身份,求解器= adam   ................................ [CV]激活=身份,   resolver = lbfgs ............................... [CV]激活=身份,   resolver = sgd .................................. [CV]激活=身份,   resolver = sgd .................................

     

<删除了工作正常的多余行>

     

!!!这是它开始失败的地方[CV] !!!!

....................激活= relu,求解器= lbfgs,总计= 0.5s

  

joblib.externals.loky.process_executor._RemoteTraceback:“”“追溯   (最近通话最近):文件   “ C:\ Python367-64 \ lib \ site-packages \ joblib \ externals \ loky \ process_executor.py”,   _process_worker中的第418行       r = call_item()文件“ C:\ Python367-64 \ lib \ site-packages \ joblib \ externals \ loky \ process_executor.py”,   第272行,在致电中       返回self.fn(* self.args,** self.kwargs)文件“ C:\ Python367-64 \ lib \ site-packages \ joblib_parallel_backends.py”,行   567,在致电中       返回self.func(* args,** kwargs)文件“ C:\ Python367-64 \ lib \ site-packages \ joblib \ parallel.py”,第225行,在   致电       用于func,args,self.items中的kwargs]文件“ C:\ Python367-64 \ lib \ site-packages \ joblib \ parallel.py”,第225行,在          用于self.items中的func,args,kwargs]文件“ C:\ Python367-64 \ lib \ site-packages \ sklearn \ model_selection_validation.py”,   _fit_and_score中的第554行       test_scores = _score(估算器,X_test,y_test,计分器,is_multimetric)文件   “ C:\ Python367-64 \ lib \ site-packages \ sklearn \ model_selection_validation.py”,   _score中的第597行       返回_multimetric_score(estimator,X_test,y_test,scorer)文件   “ C:\ Python367-64 \ lib \ site-packages \ sklearn \ model_selection_validation.py”,   _multimetric_score中的第627行       得分=得分手(估计器,X_test,y_test)文件“ C:\ Python367-64 \ lib \ site-packages \ sklearn \ metrics \ scorer.py”,行   240,在_passthrough_scorer中       返回estimator.score(* args,** kwargs)文件“ C:\ Python367-64 \ lib \ site-packages \ sklearn \ base.py”,行410,在   得分了       y_type,_,_,_ = _check_reg_targets(y,y_pred,无)文件“ C:\ Python367-64 \ lib \ site-packages \ sklearn \ metrics \ regression.py”,   第79行,在_check_reg_targets中       y_pred = check_array(y_pred,sure_2d = False)文件“ C:\ Python367-64 \ lib \ site-packages \ sklearn \ utils \ validation.py”,行   542,在check_array中       allow_nan = force_all_finite =='allow-nan')文件“ C:\ Python367-64 \ lib \ site-packages \ sklearn \ utils \ validation.py”,行   56,位于_assert_all_finite中       引发ValueError(msg_err.format(type_err,X.dtype))ValueError:输入包含NaN,无穷大或值对于   dtype('float64')。 “”“

     

上述异常是以下异常的直接原因:

     

回溯(最近一次通话最后一次):文件“ mlp_optimizer.py”,第93行,   在       grid_search.fit(X_train,y_train)文件“ C:\ Python367-64 \ lib \ site-packages \ sklearn \ model_selection_search.py​​”,   687行,适合       self._run_search(evaluate_candidates)文件“ C:\ Python367-64 \ lib \ site-packages \ sklearn \ model_selection_search.py​​”,   _run_search中的第1148行       Evaluation_candidates(ParameterGrid(self.param_grid))文件“ C:\ Python367-64 \ lib \ site-packages \ sklearn \ model_selection_search.py​​”,   第666行,在Evaluation_candidates中       cv.split(X,y,groups)))文件“ C:\ Python367-64 \ lib \ site-packages \ joblib \ parallel.py”,第934行,在   致电       self.retrieve()文件“ C:\ Python367-64 \ lib \ site-packages \ joblib \ parallel.py”,行833,在   找回       self._output.extend(job.get(timeout = self.timeout))文件“ C:\ Python367-64 \ lib \ site-packages \ joblib_parallel_backends.py”,行   521,在wrap_future_result中       返回future.result(timeout = timeout)文件“ C:\ Python367-64 \ lib \ concurrent \ futures_base.py”,结果为432行       返回self .__ get_result()文件“ C:\ Python367-64 \ lib \ concurrent \ futures_base.py”,行384,在   __get_result       引发self._exception ValueError:输入包含NaN,无穷大或对于dtype('float64')而言太大的值。

为什么不使用GridSearchCV却不能使用GridSearchCV导致失败?

1 个答案:

答案 0 :(得分:0)

问题与此行有关:

'solver': ['lbfgs', 'sgd', 'adam'],

sgd选项要求每个the documentation的某个阈值内的某些参数

只需更改 'solver': ['lbfgs', 'sgd', 'adam'],

'solver': ['lbfgs', 'adam'],

解决了问题