我一直在尝试在以下脚本中使用sklearn的GridSearchCV多种类型(每次逻辑回归一次,总共3次)。最终发生的事情是第一次逻辑回归的第一个GridSearch已经完成,当第二个gridsearch即将开始时,终端只是挂起而没有任何反应。
我正在使用Keras进行逻辑回归
我会喜欢一些反馈,因为这个问题有点烦人。
PS。这是我第一次发帖,所以如果需要,我很乐意提供更多信息。
这是脚本:
def braf():
mut_pred=mutation_prediction(X_train_all_genes, Y_train_all_genes, X_valid=X_test_all_genes, Y_valid=Y_test_all_genes)
print('Starting BRAF...')
BRAF_history= History()
braf_estimator = KerasClassifier(build_fn=mut_pred.braf_model, epochs=30, batch_size=15, verbose=0)
braf_param_grid = dict(braf_learning_rate =
list(np.linspace(0,0.0001, num=5)), braf_lasso_rate =
list(np.linspace(0,0.0001, num=5)))
braf_grid = GridSearchCV(estimator=braf_estimator, cv=2,
param_grid=braf_param_grid, n_jobs=30,pre_dispatch=5)
braf_grid_result = braf_grid.fit(X_train_all_genes.values, Y_train_all_genes['BRAF_mutant'].values,callbacks=[BRAF_history])
print('Done with BRAF')
plot_loss(BRAF_history.history['loss'], title='BRAF LOSS')
plot_accuracy(BRAF_history.history['acc'], title='BRAF Accuracy')
BRAF_pred=list(map(lambda x:int(x),braf_grid.predict(X_test_all_genes.values)))
return BRAF_pred
def kras():
mut_pred=mutation_prediction(X_train_all_genes, Y_train_all_genes, X_valid=X_test_all_genes, Y_valid=Y_test_all_genes)
print('Starting KRAS...')
KRAS_history= History()
kras_estimator = KerasClassifier(build_fn=mut_pred.kras_model, epochs=30, batch_size=15, verbose=0)
kras_param_grid = dict(kras_learning_rate =
list(np.linspace(0,0.0001, num=10)), kras_lasso_rate =
list(np.linspace(0,0.0001, num=10)))
kras_grid = GridSearchCV(estimator=kras_estimator, cv=2, param_grid=kras_param_grid, n_jobs=30,pre_dispatch=5)
kras_grid_result = kras_grid.fit(X_train_all_genes.values, Y_train_all_genes['KRAS_mutant'].values,callbacks=[KRAS_history])
print('Done with KRAS')
plot_loss(KRAS_history.history['loss'], title='KRAS LOSS')
plot_accuracy(KRAS_history.history['acc'], title='KRAS Accuracy')
KRAS_pred=list(map(lambda x:int(x),kras_grid.predict(X_test_all_genes.values)))
return KRAS_pred
def tp53():
mut_pred=mutation_prediction(X_train_all_genes, Y_train_all_genes, X_valid=X_test_all_genes, Y_valid=Y_test_all_genes)
print('Starting TP53...')
TP53_history= History()
tp53_estimator = KerasClassifier(build_fn=mut_pred.tp53_model, epochs=30, batch_size=15, verbose=0)
tp53_param_grid = dict(tp53_learning_rate =
list(np.linspace(0,0.001, num=10)), tp53_lasso_rate =
list(np.linspace(0,0.0001, num=10)))
tp53_grid = GridSearchCV(estimator=tp53_estimator, cv=2, param_grid=tp53_param_grid, jobs=30,pre_dispatch=5)
tp53_grid_result = tp53_grid.fit(X_train_all_genes.values, Y_train_all_genes['TP53_mutant'].values,callbacks=[TP53_history])
print('Done with TP53')
plot_loss(TP53_history.history['loss'], title='TP53 LOSS')
plot_accuracy(TP53_history.history['acc'], title='TP53 Accuracy')
TP53_pred=list(map(lambda
x:int(x),tp53_grid.predict(X_test_all_genes.values)))
在我的main()
中,我将上述函数称为对这三个基因进行LR,并使用学习率和套索变量的最佳组合返回预测。
任何反馈都会有所帮助
更新 当我中断这个过程时,我得到以下内容:
Process ForkPoolWorker-58:
Process ForkPoolWorker-42:
Process ForkPoolWorker-56:
Process ForkPoolWorker-54:
Process ForkPoolWorker-52:
Process ForkPoolWorker-46:
Process ForkPoolWorker-40:
Process ForkPoolWorker-44:
Process ForkPoolWorker-38:
Process ForkPoolWorker-36:
Process ForkPoolWorker-60:
Process ForkPoolWorker-59:
Process ForkPoolWorker-43:
Process ForkPoolWorker-37:
Process ForkPoolWorker-39:
Process ForkPoolWorker-41:
Process ForkPoolWorker-45:
Process ForkPoolWorker-48:
Process ForkPoolWorker-53:
Process ForkPoolWorker-47:
Process ForkPoolWorker-57:
Process ForkPoolWorker-55:
Process ForkPoolWorker-49:
Process ForkPoolWorker-51:
Traceback (most recent call last):
File "FinalProjectV1.py", line 354, in <module>
main()
File "FinalProjectV1.py", line 332, in main
KRAS_pred_test=kras()
File "FinalProjectV1.py", line 309, in kras
kras_grid_result = kras_grid.fit(X_train_all_genes.values, Y_train_all_genes['KRAS_mutant'].values,callbacks=[KRAS_history])
File "/soe/ianastop/lib/python3.6/site- packages/sklearn/model_selection/_search.py", line 639, in fit
cv.split(X, y, groups)))
File "/soe/ianastop/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 789, in __call__
self.retrieve()
File "/soe/ianastop/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 699, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/soe/ianastop/venv/lib/python3.6/multiprocessing/pool.py", line 638, in get
self.wait(timeout)
File "/soe/ianastop/venv/lib/python3.6/multiprocessing/pool.py", line 635, in wait
self._event.wait(timeout)
File "/soe/ianastop/venv/lib/python3.6/threading.py", line 551, in wait
signaled = self._cond.wait(timeout)
File "/soe/ianastop/venv/lib/python3.6/threading.py", line 295, in wait
waiter.acquire()
KeyboardInterrupt
看起来它与多进程库有关吗?
答案 0 :(得分:0)
我可以在MacBook Pro上重现错误。
这里的问题是tensorflow会话。如果在GridSearchCV.fit()
之前在父进程中创建了一个会话,它肯定会挂起。
一种可能的解决方案是将所有会话创建代码限制为KerasClassifer
类和模型创建函数。
此外,您可能希望在模型创建函数或KerasClassifier
的子类中限制TF的内存使用。
快速解决方案:
n_jobs = 1
但需要很长时间才能完成。
<强>参考文献:强>
Session hang issue with python multiprocessing
Keras + Tensorflow and Multiprocessing in Python
Limit the resource usage for tensorflow backend
GridSearchCV Hangs On Second Run
scikit-lean GridSearchCV n_jobs != 1 freezing
keras + scikit-learn wrapper, appears to hang when GridSearchCV with n_jobs >1 Ask