我是AI / ML的新手,最近发现了h2o。
由于要运行所有不同的组合,因此我正在尝试使用笛卡尔搜索进行网格搜索以进行深度学习。我使用相同的训练和验证文件以及相同的超级搜索参数集和grid.train参数进行了两次跑步。两次运行都生成相同数量的模型,并且每个模型都使用相同的输入参数“激活的自适应率epsilon隐藏的hidden_dropout_ratios input_dropout_ratio rho”生成。
我的观察是,对于使用相同输入参数的每次运行,生成的模型具有不同的对数损失,平均每类错误,MSE,RMSE等。 为了减少进一步的用户错误,我仅将网格搜索限制为仅一组参数。我的发现在下面,其中包含详细的日志等。
我的问题是在给定相同参数集和训练/验证框架的情况下,如何保证生成的模型完全相同。
培训和验证文件格式和数据
BPS1,BPS2,ZSRTN,PCNT_RTN,PCNT_RTN100,Open,High,Low,Close,Time
58,18 , 3.00 , -0.12 , -12 , 297.2700 , 297.3100 , 297.0800 , 297.1700 , 201907050935
18,20 , 3.00 , -0.11 , -11 , 297.1800 , 297.1900 , 296.9300 , 296.9400 , 201907050940
20,20 , 5.00 , 0.01 , 1 , 296.9400 , 297.2600 , 296.8200 , 297.2150 , 201907050945
20,30 , 5.00 , 0.03 , 3 , 297.2200 , 297.2600 , 297.0400 , 297.0400 , 201907050950
值
activation = RectifierWithDropout
adaptive_rate = true
epsilon = 1.0E-6
hidden = [200]
hidden_dropout_ratios = [0.1]
input_dropout_ratio 0.05
rho = 0.9
Python代码
hyper_parameters = {
"hidden": [[200]],
"epsilon" : 1.0E-6,
"adaptive_rate": True,
"activation": ["RectifierWithDropout"],
"input_dropout_ratio" : [0.05],
"hidden_dropout_ratios" : [0.1],
"rho":[0.9]
}
.....
search_criteria = {"strategy": "Cartesian"}
.....
model_grid = H2OGridSearch(model = H2ODeepLearningEstimator,
grid_id = project_name,
hyper_params=hyper_parameters,
search_criteria=search_criteria)
model_grid.train(x=x,
y = response_column,
distribution=default_distribution, epochs=10000,
training_frame=train, validation_frame=test,
score_interval=0, stopping_rounds=5,
stopping_tolerance=1e-3,
stopping_metric="mean_per_class_error")
准备首次运行
07-02 09:27:45.989 192.168.123.5:54321 #7248 #75857-26 INFO: Starting gridsearch: estimated size of search space = 1
07-02 09:27:45.990 192.168.123.5:54321 #7248 FJ-1-51 INFO: Due to the grid time limit, changing model max runtime to: 1.7976931348623157E308 secs.
07-02 09:27:45.992 192.168.123.5:54321 #7248 FJ-1-51 INFO: Building H2O DeepLearning model with these parameters:
07-02 09:27:45.992 192.168.123.5:54321 #7248 FJ-1-51 INFO: {"_train":{"name":"py_1_sid_b81b","type":"Key"},"_valid":{"name":"py_2_sid_b81b","type":"Key"},"_nfolds":0,"_keep_cross_validation_models":true,"_keep_cross_validation_predictions":false,"_keep_cross_validation_fold_assignment":false,"_parallelize_cross_validation":true,"_auto_rebalance":true,"_seed":-1,"_fold_assignment":"AUTO","_categorical_encoding":"AUTO","_max_categorical_levels":10,"_distribution":"AUTO","_tweedie_power":1.5,"_quantile_alpha":0.5,"_huber_alpha":0.9,"_ignored_columns":["Close","PCNT_RTN100","High","Low","PCNT_RTN","Time","Open"],"_ignore_const_cols":true,"_weights_column":null,"_offset_column":null,"_fold_column":null,"_check_constant_response":true,"_is_cv_model":false,"_score_each_iteration":false,"_max_runtime_secs":1.7976931348623157E308,"_stopping_rounds":5,"_stopping_metric":"mean_per_class_error","_stopping_tolerance":0.001,"_response_column":"ZSRTN","_balance_classes":false,"_max_after_balance_size":5.0,"_class_sampling_factors":null,"_max_confusion_matrix_size":20,"_checkpoint":null,"_pretrained_autoencoder":null,"_custom_metric_func":null,"_custom_distribution_func":null,"_export_checkpoints_dir":null,"_overwrite_with_best_model":true,"_autoencoder":false,"_use_all_factor_levels":true,"_standardize":true,"_activation":"RectifierWithDropout","_hidden":[200],"_epochs":10000.0,"_train_samples_per_iteration":-2,"_target_ratio_comm_to_comp":0.05,"_adaptive_rate":true,"_rho":0.9,"_epsilon":1.0E-6,"_rate":0.005,"_rate_annealing":1.0E-6,"_rate_decay":1.0,"_momentum_start":0.0,"_momentum_ramp":1000000.0,"_momentum_stable":0.0,"_nesterov_accelerated_gradient":true,"_input_dropout_ratio":0.05,"_hidden_dropout_ratios":[0.1],"_l1":0.0,"_l2":0.0,"_max_w2":3.4028235E38,"_initial_weight_distribution":"UniformAdaptive","_initial_weight_scale":1.0,"_initial_weights":null,"_initial_biases":null,"_loss":"Automatic","_score_interval":0.0,"_score_training_samples":10000,"_score_validation_samples":0,"_score_duty_cycle":0.1,"_classification_stop":0.0,"_regression_stop":1.0E-6,"_quiet_mode":false,"_score_validation_sampling":"Uniform","_diagnostics":true,"_variable_importances":true,"_fast_mode":true,"_force_load_balance":true,"_replicate_training_data":true,"_single_node_mode":false,"_shuffle_training_data":false,"_missing_values_handling":"MeanImputation","_sparse":false,"_col_major":false,"_average_activation":0.0,"_sparsity_beta":0.0,"_max_categorical_features":2147483647,"_reproducible":false,"_export_weights_and_biases":false,"_elastic_averaging":false,"_elastic_averaging_moving_rate":0.9,"_elastic_averaging_regularization":0.001,"_mini_batch_size":1}
07-02 09:27:45.992 192.168.123.5:54321 #7248 FJ-1-51 INFO: Dropping ignored columns: [Close, PCNT_RTN100, High, Low, PCNT_RTN, Time, Open]
07-02 09:27:45.992 192.168.123.5:54321 #7248 FJ-1-51 INFO: Dataset already contains 128 chunks. No need to rebalance.
07-02 09:27:45.993 192.168.123.5:54321 #7248 FJ-1-51 INFO: Starting model DeepLearning__gen_202007020927_m_5_r_2_b_2_pp_0.05_l_1_t_10_model_1
第一次运行的结果
07-02 09:27:51.513 192.168.123.5:54321 #7248 #75857-30 INFO: Hyper-Parameter Search Summary (ordered by increasing logloss):
07-02 09:27:51.513 192.168.123.5:54321 #7248 #75857-30 INFO: activation adaptive_rate epsilon hidden hidden_dropout_ratios input_dropout_ratio rho model_ids logloss
07-02 09:27:51.513 192.168.123.5:54321 #7248 #75857-30 INFO: RectifierWithDropout true 1.0E-6 [200] [0.1] 0.05 0.9 DeepLearning__gen_202007020927_m_5_r_2_b_2_pp_0.05_l_1_t_10_model_1 1.7762588168309075
为第二次运行做准备
07-02 09:32:49.293 192.168.123.5:54321 #7248 #75857-29 INFO: Starting gridsearch: estimated size of search space = 1
07-02 09:32:49.293 192.168.123.5:54321 #7248 FJ-1-25 INFO: Due to the grid time limit, changing model max runtime to: 1.7976931348623157E308 secs.
07-02 09:32:49.294 192.168.123.5:54321 #7248 FJ-1-25 INFO: Building H2O DeepLearning model with these parameters:
07-02 09:32:49.294 192.168.123.5:54321 #7248 FJ-1-25 INFO: {"_train":{"name":"py_1_sid_aeed","type":"Key"},"_valid":{"name":"py_2_sid_aeed","type":"Key"},"_nfolds":0,"_keep_cross_validation_models":true,"_keep_cross_validation_predictions":false,"_keep_cross_validation_fold_assignment":false,"_parallelize_cross_validation":true,"_auto_rebalance":true,"_seed":-1,"_fold_assignment":"AUTO","_categorical_encoding":"AUTO","_max_categorical_levels":10,"_distribution":"AUTO","_tweedie_power":1.5,"_quantile_alpha":0.5,"_huber_alpha":0.9,"_ignored_columns":["Time","Open","PCNT_RTN","PCNT_RTN100","Low","Close","High"],"_ignore_const_cols":true,"_weights_column":null,"_offset_column":null,"_fold_column":null,"_check_constant_response":true,"_is_cv_model":false,"_score_each_iteration":false,"_max_runtime_secs":1.7976931348623157E308,"_stopping_rounds":5,"_stopping_metric":"mean_per_class_error","_stopping_tolerance":0.001,"_response_column":"ZSRTN","_balance_classes":false,"_max_after_balance_size":5.0,"_class_sampling_factors":null,"_max_confusion_matrix_size":20,"_checkpoint":null,"_pretrained_autoencoder":null,"_custom_metric_func":null,"_custom_distribution_func":null,"_export_checkpoints_dir":null,"_overwrite_with_best_model":true,"_autoencoder":false,"_use_all_factor_levels":true,"_standardize":true,"_activation":"RectifierWithDropout","_hidden":[200],"_epochs":10000.0,"_train_samples_per_iteration":-2,"_target_ratio_comm_to_comp":0.05,"_adaptive_rate":true,"_rho":0.9,"_epsilon":1.0E-6,"_rate":0.005,"_rate_annealing":1.0E-6,"_rate_decay":1.0,"_momentum_start":0.0,"_momentum_ramp":1000000.0,"_momentum_stable":0.0,"_nesterov_accelerated_gradient":true,"_input_dropout_ratio":0.05,"_hidden_dropout_ratios":[0.1],"_l1":0.0,"_l2":0.0,"_max_w2":3.4028235E38,"_initial_weight_distribution":"UniformAdaptive","_initial_weight_scale":1.0,"_initial_weights":null,"_initial_biases":null,"_loss":"Automatic","_score_interval":0.0,"_score_training_samples":10000,"_score_validation_samples":0,"_score_duty_cycle":0.1,"_classification_stop":0.0,"_regression_stop":1.0E-6,"_quiet_mode":false,"_score_validation_sampling":"Uniform","_diagnostics":true,"_variable_importances":true,"_fast_mode":true,"_force_load_balance":true,"_replicate_training_data":true,"_single_node_mode":false,"_shuffle_training_data":false,"_missing_values_handling":"MeanImputation","_sparse":false,"_col_major":false,"_average_activation":0.0,"_sparsity_beta":0.0,"_max_categorical_features":2147483647,"_reproducible":false,"_export_weights_and_biases":false,"_elastic_averaging":false,"_elastic_averaging_moving_rate":0.9,"_elastic_averaging_regularization":0.001,"_mini_batch_size":1}
07-02 09:32:49.295 192.168.123.5:54321 #7248 FJ-1-25 INFO: Dropping ignored columns: [Time, Open, PCNT_RTN, PCNT_RTN100, Low, Close, High]
07-02 09:32:49.295 192.168.123.5:54321 #7248 FJ-1-25 INFO: Dataset already contains 128 chunks. No need to rebalance.
07-02 09:32:49.295 192.168.123.5:54321 #7248 FJ-1-25 INFO: Starting model DeepLearning__gen_202007020932_m_5_r_2_b_2_pp_0.05_l_1_t_10_model_1
第二次运行的结果
07-02 09:32:53.914 192.168.123.5:54321 #7248 #75857-32 INFO: Hyper-Parameter Search Summary (ordered by increasing logloss):
07-02 09:32:53.914 192.168.123.5:54321 #7248 #75857-32 INFO: activation adaptive_rate epsilon hidden hidden_dropout_ratios input_dropout_ratio rho model_ids logloss
07-02 09:32:53.914 192.168.123.5:54321 #7248 #75857-32 INFO: RectifierWithDropout true 1.0E-6 [200] [0.1] 0.05 0.9 DeepLearning__gen_202007020932_m_5_r_2_b_2_pp_0.05_l_1_t_10_model_1 1.7002255980952898