我必须在18年的数据集上训练lightgbm模型。为此,我想到了分批训练lightgbm模型(逐年训练),即第一年数据上的第一训练模型,保存该模型并在下一年数据上训练保存的模型,依此类推。我为此使用了以下代码:
if not os.path.isfile(prev_model_path):
print('training model...')
t1 = time.time()
model = lgb.train(params=params,train_set= train_data, valid_sets=valid_data, early_stopping_rounds=10,num_boost_round=500)
print('time taken to train model : ',time.time()-t1)
print("Initial iter# %d" %model.current_iteration())
model.save_model(model_path)
prev_model_path = model_path
else:
print('training model...')
t1 = time.time()
model = lgb.train(params=params,train_set=train_data,valid_sets=valid_data,init_model=prev_model_path,early_stopping_rounds=10,num_boost_round=500)
print('time taken to train model : ',time.time()-t1)
print("Initial iter# %d" %model.current_iteration())
model.save_model(model_path)
当我运行代码时,模型已成功训练并保存了重量。螺母,当明年的数据到来时,模型将受到训练,并且在达到早期停止条件后,malloc错误将得到:
training model...
[LightGBM] [Warning] Starting from the 2.1.2 version, default value for the "boost_from_average" parameter in "binary" objective is true.
This may cause significantly different results comparing to the previous versions of LightGBM.
Try to set boost_from_average=false, if your old models produce bad results
[LightGBM] [Info] Number of positive: 1874155, number of negative: 1678179
[LightGBM] [Info] Total Bins 792
[LightGBM] [Info] Number of data: 3552334, number of used features: 90
[292] valid_0's binary_error: 0.0120543
Training until validation scores don't improve for 10 rounds
[293] valid_0's binary_error: 0.0119233
[294] valid_0's binary_error: 0.0118339
[295] valid_0's binary_error: 0.011734
[296] valid_0's binary_error: 0.0116654
[297] valid_0's binary_error: 0.0116488
[298] valid_0's binary_error: 0.0116404
[299] valid_0's binary_error: 0.0116051
[300] valid_0's binary_error: 0.0115968
[301] valid_0's binary_error: 0.01163
[302] valid_0's binary_error: 0.0116363
[303] valid_0's binary_error: 0.0116633
[304] valid_0's binary_error: 0.0116966
[305] valid_0's binary_error: 0.0117611
[306] valid_0's binary_error: 0.0118609
[307] valid_0's binary_error: 0.0119108
[308] valid_0's binary_error: 0.0120522
[309] valid_0's binary_error: 0.0121042
[310] valid_0's binary_error: 0.0122103
Early stopping, best iteration is:
[300] valid_0's binary_error: 0.0115968
*** Error in `/venv/bin/python3.6': malloc(): memory corruption: 0x0000000020fe03b0 ***
我的方法正确吗?有人知道这个问题的解决方案吗?