训练后如何还原大型Tensorforce代理(18GB)

时间:2019-03-16 13:40:38

标签: python tensorflow reinforcement-learning

我尝试用Tensorforce库训练一个PPOAgent,该库具有大约100个变量的数据集。

但是,在训练并保存了座席之后,我无法恢复座席以进行进一步的培训,并且系统给我以下错误提示。

{2019-03-16 13:30:04.301893: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Out of range: Read fewer bytes than requested}

我还尝试训练和还原较小的代理(<1GB)并成功,因此,我认为这不是因为编码错误。

我的系统应该完全有能力运行模型,并且仅使用CPU进行训练。规格如下所示:

Google VM 8 CPU 2GHz
512 GB RAM and hard disk
Windows Server 2016
Tensorflow 1.13.1
Tensorforce 0.4.3
Anaconda Python 3.6.8

如果任何人都可以分享想法来解决此问题,这将是很棒的,并将受到高度赞赏。

  

我的代码如下所示:

def create_network_spec():
network_spec = [
    {"type": "flatten"},
    dict(size= 152, type = 'dense', activation = 'selu', l2_regularization = 0.007920429669304596, 
         l1_regularization = 0.07371959121453227),
    dict(type = 'dropout', rate = 0.12911336420147002),
    dict(size= 228, type = 'dense', activation = 'selu', l2_regularization = 0.007920429669304596, 
         l1_regularization = 0.07371959121453227),
    dict(type = 'dropout', rate = 0.12911336420147002),
    dict(type='internal_lstm', size=456, dropout = 0.12911336420147002),
    dict(type='internal_lstm', size=456, dropout = 0.12911336420147002),
    dict(type='internal_lstm', size=456, dropout = 0.12911336420147002),
]
return network_spec

def create_baseline_spec():
baseline_spec = [
    {
        "type": "lstm",
        "size": 456,
    },
    dict(type='dense', size=228, activation='selu'),
    dict(type = 'dropout', rate = 0.12911336420147002),
    dict(type='dense', size=152, activation='selu'),
    dict(type = 'dropout', rate = 0.12911336420147002),
    dict(type='dense', size=114, activation='selu'),
    dict(type = 'dropout', rate = 0.12911336420147002),
]
return baseline_spec

def testing_agent(environment):

network_spec = create_network_spec()
baseline_spec = create_baseline_spec()
agent = PPOAgent(
    discount=0.9677484438906688,
    states=environment.states,
    actions=environment.actions,
    network=network_spec,
    states_preprocessing=None,
    actions_exploration=None,
    reward_preprocessing=None,
    update_mode=dict(
        unit= 'episodes',
        batch_size= 800,
        frequency=12
    ),
    memory=None, 
    distributions=None,
    entropy_regularization=0.8766700482699176,

    baseline_mode='states',
    baseline=dict(type='custom', network=baseline_spec),
    baseline_optimizer=dict(
        type='multi_step',
        optimizer=dict(
            type='adam',
            learning_rate=0.7342086295731822
        ),
        num_steps=17
    ),
    gae_lambda=0.9285969986276199,
    likelihood_ratio_clipping=0.9837192515975488,
    step_optimizer=dict(
        type='adam',
        learning_rate=0.7342086295731822
    ),
    subsampling_fraction=0.1,
    optimization_steps=17,
    execution=None
)
try :
    agent.restore_model("./model_new/")
except:
    agent.save_model("./model_new/")
return agent

0 个答案:

没有答案