结合使用带有RFR的虚拟GridSearch时出现multiprocessing.pool.MaybeEncodingError

时间:2019-06-21 14:23:37

标签: python python-3.x scikit-learn multiprocessing hyperparameters

在优化RFR(Ubuntu Python 3.6.4)时,出现了以下错误跟踪。

Traceback (most recent call last):
  File "main.py", line 177, in <module>
    model_test_path, save_predictions_path, args.classifier, args.criterion, args.num_threads, args.relu, args.verbose)
  File "main.py", line 128, in main
    model = train_model(train_p, val_p, gs_p, clfr, criterion, num_threads, verbose)
  File "main.py", line 74, in train_model
    model = gs.fit(X_train, y_train, X_val, y_val, scoring='neg_mean_squared_error', verbose=verbose)
  File "/home/bram/.local/share/virtualenvs/dpc_cross_ml-DMx7zq90/lib/python3.6/site-packages/hypopt/model_selection.py", line 289, in fit
    results = _parallel_param_opt(jobs, threads = self.num_threads)
  File "/home/bram/.local/share/virtualenvs/dpc_cross_ml-DMx7zq90/lib/python3.6/site-packages/hypopt/model_selection.py", line 146, in _parallel_param_opt
    results = pool.map(_run_thread_job, lst)
  File "/home/bram/.pyenv/versions/3.6.4/lib/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/bram/.pyenv/versions/3.6.4/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '[(RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
                      max_features='auto', max_leaf_nodes=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=1, min_samples_split=2,
                      min_weight_fraction_leaf=0.0, n_estimators=500,
                      n_jobs=None, oob_score=False, random_state=0,
                      verbose=True, warm_start=False), -1.4925891438469021)]'. Reason: 'error("'i' format requires -2147483648 <= number <= 2147483647",)'

在Windows上的Python 3.7.1中,跟踪略有不同。

Traceback (most recent call last):
  File "c:\python\python37\Lib\threading.py", line 917, in _bootstrap_inner
    self.run()
  File "c:\python\python37\Lib\threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "c:\python\python37\Lib\multiprocessing\pool.py", line 496, in _handle_results
    task = get()
  File "c:\python\python37\Lib\multiprocessing\connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "c:\python\python37\Lib\multiprocessing\connection.py", line 318, in _recv_bytes
    return self._get_more_data(ov, maxsize)
  File "c:\python\python37\Lib\multiprocessing\connection.py", line 337, in _get_more_data
    assert left > 0
AssertionError

据我了解,这是由于多处理程序包难以返回或腌制大于2GB的项目而引起的。我的输出模型大约是5GB。

我的代码:

# load_data() loads all data in memory from a given path
X_train, y_train = load_data(train_p)
X_train = StandardScaler().fit_transform(X_train)

X_val, y_val = load_data(val_p)
X_val = StandardScaler().fit_transform(X_val)

# load grid from some path. Here the contents are 4 values for n_estimators
with open(str(gs_p), encoding='utf-8') as fhin:
    param_grid = json.load(fhin)

gs = GridSearch(model=RandomForestRegressor(verbose=True), num_threads=24, param_grid=param_grid)
model = gs.fit(X_train, y_train, X_val, y_val, scoring='neg_mean_squared_error', verbose=True) # this line throws the error

有什么办法可以避免这种情况?我想念什么吗?

0 个答案:

没有答案