在优化RFR(Ubuntu Python 3.6.4)时,出现了以下错误跟踪。
Traceback (most recent call last):
File "main.py", line 177, in <module>
model_test_path, save_predictions_path, args.classifier, args.criterion, args.num_threads, args.relu, args.verbose)
File "main.py", line 128, in main
model = train_model(train_p, val_p, gs_p, clfr, criterion, num_threads, verbose)
File "main.py", line 74, in train_model
model = gs.fit(X_train, y_train, X_val, y_val, scoring='neg_mean_squared_error', verbose=verbose)
File "/home/bram/.local/share/virtualenvs/dpc_cross_ml-DMx7zq90/lib/python3.6/site-packages/hypopt/model_selection.py", line 289, in fit
results = _parallel_param_opt(jobs, threads = self.num_threads)
File "/home/bram/.local/share/virtualenvs/dpc_cross_ml-DMx7zq90/lib/python3.6/site-packages/hypopt/model_selection.py", line 146, in _parallel_param_opt
results = pool.map(_run_thread_job, lst)
File "/home/bram/.pyenv/versions/3.6.4/lib/python3.6/multiprocessing/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/home/bram/.pyenv/versions/3.6.4/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '[(RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=500,
n_jobs=None, oob_score=False, random_state=0,
verbose=True, warm_start=False), -1.4925891438469021)]'. Reason: 'error("'i' format requires -2147483648 <= number <= 2147483647",)'
在Windows上的Python 3.7.1中,跟踪略有不同。
Traceback (most recent call last):
File "c:\python\python37\Lib\threading.py", line 917, in _bootstrap_inner
self.run()
File "c:\python\python37\Lib\threading.py", line 865, in run
self._target(*self._args, **self._kwargs)
File "c:\python\python37\Lib\multiprocessing\pool.py", line 496, in _handle_results
task = get()
File "c:\python\python37\Lib\multiprocessing\connection.py", line 250, in recv
buf = self._recv_bytes()
File "c:\python\python37\Lib\multiprocessing\connection.py", line 318, in _recv_bytes
return self._get_more_data(ov, maxsize)
File "c:\python\python37\Lib\multiprocessing\connection.py", line 337, in _get_more_data
assert left > 0
AssertionError
据我了解,这是由于多处理程序包难以返回或腌制大于2GB的项目而引起的。我的输出模型大约是5GB。
我的代码:
# load_data() loads all data in memory from a given path
X_train, y_train = load_data(train_p)
X_train = StandardScaler().fit_transform(X_train)
X_val, y_val = load_data(val_p)
X_val = StandardScaler().fit_transform(X_val)
# load grid from some path. Here the contents are 4 values for n_estimators
with open(str(gs_p), encoding='utf-8') as fhin:
param_grid = json.load(fhin)
gs = GridSearch(model=RandomForestRegressor(verbose=True), num_threads=24, param_grid=param_grid)
model = gs.fit(X_train, y_train, X_val, y_val, scoring='neg_mean_squared_error', verbose=True) # this line throws the error
有什么办法可以避免这种情况?我想念什么吗?