在完成排列之前,Talos.Scan()会短暂停止而不会出现错误

时间:2019-06-02 18:10:48

标签: python-3.x machine-learning keras hyperparameters talos

我尝试了许多选项进行调试,并且在停止运行之前,我无法使talos执行更多的排列,而对此问题没有任何提示。这种情况似乎很简单,那么我在做什么错了?

输入数据可用here

以下是我的模型函数,参数空间和talos.Scan()调用。完整的代码可用here

# Create, compile and fit network
# This is rewritten for talos hyperparamter optimization
# Removed kernel_initializer='normal' from dense layers from example. Default is glorot_uniform
def createNetworkAndFit(trainVectors, trainLabels, validationVectors, validationLabels, params):
    # Create model
    model = Sequential()
    model.add(Dense(params['first_neuron'], input_dim=trainVectors.shape[1], activation=params['activation']))
    model.add(Dropout(params['dropout']))
    talos.model.layers.hidden_layers(model, params, 1)
    model.add(Dense(1, activation=params['last_activation']))
    # Compile model
    model.compile(loss=params['losses'], optimizer=params['optimizer'](), metrics=['accuracy', fmeasure_acc, 'mean_squared_error'])
    # Fit model
    history = model.fit(trainVectors, trainLabels, validation_data=[validationVectors, validationLabels], batch_size=params['batch_size'], epochs=params['epochs'], verbose=0)
    return history, model

# Define hyperparameter space
# As hidden layers are generated, "last_neuron" is the number of hidden units.
# Does this mean all hidden layers have the same number of hidden units?
p = {'first_neuron': [trainVectors.shape[1]],
    'dropout': [0, 0.25, 0.5],
    'hidden_layers': [2, 3],
    'shapes': ['brick', 'funnel'],
    'batch_size': [trainVectors.shape[0], int(trainVectors.shape[0]/10), int(trainVectors.shape[0]/100), int(trainVectors.shape[0]/1000)],
    'epochs': [300],
    'optimizer': [Nadam, Adam, RMSprop],
    'losses': [binary_crossentropy],
    'activation': [relu, elu],
    'last_activation': ['sigmoid']}

# Hyperparamter Search
experiment = talos.Scan(x=trainVectors,
                        y=trainLabels,
                        model=createNetworkAndFit,
                        grid_downsample=0.01,
                        params=p,
                        dataset_name='15000_talos',
                        experiment_no='1',
                        print_params=True,
                        disable_progress_bar=True,
                        clear_tf_session=True,
                        debug=True)

这是我的输出:

Using TensorFlow backend.
{'batch_size': 312, 'hidden_layers': 3, 'activation': <function relu at 0x7f77e75e9510>, 'epochs': 300, 'optimizer': <class 'keras.optimizers.Nadam'>, 'shapes': 'brick', 'last_activation': 'sigmoid', 'losses': <function binary_crossentropy at 0x7f777dee6ae8>, 'first_neuron': 52, 'dropout': 0.25}
2019-06-02 10:46:45.248187: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-02 10:46:45.293153: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-02 10:46:45.293569: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce GTX 780 major: 3 minor: 5 memoryClockRate(GHz): 0.941
pciBusID: 0000:01:00.0
totalMemory: 2.95GiB freeMemory: 2.84GiB
2019-06-02 10:46:45.293595: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-06-02 10:46:45.478345: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-02 10:46:45.478378: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-06-02 10:46:45.478395: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-06-02 10:46:45.478491: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2560 MB memory) -> physical GPU (device: 0, name: GeForce GTX 780, pci bus id: 0000:01:00.0, compute capability: 3.5)
{'batch_size': 3120, 'hidden_layers': 3, 'activation': <function elu at 0x7f77e75e92f0>, 'epochs': 300, 'optimizer': <class 'keras.optimizers.RMSprop'>, 'shapes': 'brick', 'last_activation': 'sigmoid', 'losses': <function binary_crossentropy at 0x7f777dee6ae8>, 'first_neuron': 52, 'dropout': 0.5}
2019-06-02 10:46:56.373641: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-06-02 10:46:56.373692: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-02 10:46:56.373707: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-06-02 10:46:56.373712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-06-02 10:46:56.373799: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2560 MB memory) -> physical GPU (device: 0, name: GeForce GTX 780, pci bus id: 0000:01:00.0, compute capability: 3.5)

EDIT1

我注意到模型函数中未使用p中的某些参数。更改后,搜索仍然会短暂停止。我已经编辑了上面的代码。

1 个答案:

答案 0 :(得分:1)

问题是我选择了grid_downsample(0.01),对于网格中可能的排列空间而言,它太小了。如果Talos提供有关随机下采样的网格大小的更多反馈,那就太好了。这是我最后得到的Scan()调用:

# Hyperparamter Search
experiment = talos.Scan(x=trainVectors,
                        y=trainLabels,
                        model=createNetworkAndFit,
                        grid_downsample=1,
                        params=p,
                        dataset_name='15000_talos',
                        experiment_no='1',
                        print_params=True,
                        disable_progress_bar=True,
                        clear_tf_session=True,
                        debug=True)