sklearn和n_jobs中的超参数优化> 1:酸洗

时间:2017-03-16 19:43:41

标签: python scikit-learn multiprocessing pickle keras

我在" pickle"。这是我的代码的结构:

  • 一个充当抽象类的基类
  • 可以实例化的子类
    • 设置参数并使用RandomizedSearchCV调用GridSearchCVn_jobs=-1的方法。
      • 本地函数create_model,用于创建由KerasClassifierKerasRegressor
      • 调用的神经网络模型(请参阅this教程)

我收到错误消息称本地对象无法被腌制。如果我改变n_jobs=1,那么没问题。所以我怀疑问题在于本地功能和并行处理。有没有解决这个问题?在谷歌搜索后,似乎序列化程序dill可以在这里工作(我甚至找到了一个名为multiprocessing_on_dill的包)。但我目前依赖的是sklearn套餐。

2 个答案:

答案 0 :(得分:2)

我找到了一个"解决方案"我的问题。我真的很困惑为什么示例herecreate_model一起使用,但我的代码并没有。似乎问题在于驻留在子类方法中的本地函数n_jobs > 1。如果我将局部函数作为子类的方法,我可以设置RandomizedSearchCV

回顾一下,这里是我的代码结构:

  • 一个充当抽象类的基类
  • 可以实例化的子类
    • 设置参数并使用GridSearchCV调用n_jobs=-1create_model的方法。
    • 方法KerasClassifier,用于创建由KerasRegressorfrom abc import ABCMeta import numpy as np from sklearn.model_selection import GridSearchCV, RandomizedSearchCV class MLAlgorithms(metaclass=ABCMeta): def __init__(self, X_train, y_train, X_test, y_test=None): """ Constructor with train and test data. :param X_train: Train descriptor data :param y_train: Train observed data :param X_test: Test descriptor data :param y_test: Test observed data """ ... @abstractmethod def setmlalg(self, mlalg): """ Sets a machine learning algorithm. :param mlalg: Dictionary of the machine learning algorithm. """ pass @abstractmethod def fitmlalg(self, mlalg, rid=None): """ Fits a machine learning algorithm. :param mlalg: Machine learning algorithm """ pass class MLClassification(MLAlgorithms): """ Main class for classification machine learning algorithms. """ def setmlalg(self, mlalg): """ Sets a classification machine learning algorithm. :param mlalg: Dictionary of the classification machine learning algorithm. """ ... def fitmlalg(self, mlalg): """ Fits a classification machine learning algorithm. :param mlalg: Classification machine learning algorithm """ ... # Function to create model, required for KerasClassifier def create_model(self, n_layers=1, units=10, input_dim=10, output_dim=1, optimizer="rmsprop", loss="binary_crossentropy", kernel_initializer="glorot_uniform", activation="sigmoid", kernel_regularizer="l2", kernel_regularizer_weight=0.01, lr=0.01, momentum=0.0, decay=0.0, nesterov=False, rho=0.9, epsilon=1E-8, beta_1=0.9, beta_2=0.999, schedule_decay=0.004): from keras.models import Sequential from keras.layers import Dense from keras import regularizers, optimizers # Create model if kernel_regularizer.lower() == "l1": kernel_regularizer = regularizers.l1(l=kernel_regularizer_weight) elif kernel_regularizer.lower() == "l2": kernel_regularizer = regularizers.l2(l=kernel_regularizer_weight) elif kernel_regularizer.lower() == "l1_l2": kernel_regularizer = regularizers.l1_l2(l1=kernel_regularizer_weight, l2=kernel_regularizer_weight) else: print("Warning: Kernel regularizer {0} not supported. Using default 'l2' regularizer.".format( kernel_regularizer)) kernel_regularizer = regularizers.l2(l=kernel_regularizer_weight) if optimizer.lower() == "sgd": optimizer = optimizers.sgd(lr=lr, momentum=momentum, decay=decay, nesterov=nesterov) elif optimizer.lower() == "rmsprop": optimizer = optimizers.rmsprop(lr=lr, rho=rho, epsilon=epsilon, decay=decay) elif optimizer.lower() == "adagrad": optimizer = optimizers.adagrad(lr=lr, epsilon=epsilon, decay=decay) elif optimizer.lower() == "adadelta": optimizer = optimizers.adadelta(lr=lr, rho=rho, epsilon=epsilon, decay=decay) elif optimizer.lower() == "adam": optimizer = optimizers.adam(lr=lr, beta_1=beta_1, beta_2=beta_2, epsilon=epsilon, decay=decay) elif optimizer.lower() == "adamax": optimizer = optimizers.adamax(lr=lr, beta_1=beta_1, beta_2=beta_2, epsilon=epsilon, decay=decay) elif optimizer.lower() == "nadam": optimizer = optimizers.nadam(lr=lr, beta_1=beta_1, beta_2=beta_2, epsilon=epsilon, schedule_decay=schedule_decay) else: print("Warning: Optimizer {0} not supported. Using default 'sgd' optimizer.".format(optimizer)) optimizer = "sgd" model = Sequential() model.add( Dense(units=units, input_dim=input_dim, kernel_initializer=kernel_initializer, activation=activation, kernel_regularizer=kernel_regularizer)) for layer_count in range(n_layers - 1): model.add( Dense(units=units, kernel_initializer=kernel_initializer, activation=activation, kernel_regularizer=kernel_regularizer)) model.add(Dense(units=output_dim, kernel_initializer=kernel_initializer, activation=activation, kernel_regularizer=kernel_regularizer)) # Compile model model.compile(loss=loss, optimizer=optimizer, metrics=['accuracy']) return model class MLRegression(MLAlgorithms): """ Main class for regression machine learning algorithms. """ ...
    • 调用的神经网络模型

代码的一般概念:

games.equals(g) == false

答案 1 :(得分:0)

当在jupyter笔记本/ ipython的Windows上以并行化(n_jobs> 1)在kerasClassifier模型上运行sklearn的网格搜索时,我可以确认相同的问题(在Unix上没有问题)。

我通过将导致酱菜问题的create_model函数放入模块并导入该模块,而不是在环境中定义该函数来解决了该问题。

要为Python创建一个简单的模块,

  • 在与运行主代码相同的文件夹中创建一个文本文件,并将其另存为my_module.py
  • 将create_model函数的定义放入文件中
  • 与其在代码中定义create_model,请使用import my_module导入模块,然后使用my_module.create_model()从模块调用函数