具有大型操作集的TensorFlow DQN中的OOM

时间:2018-07-13 10:00:50

标签: tensorflow deep-learning out-of-memory reinforcement-learning

以下代码适用于TensorFlow中的Deep Q Network。

运行此命令时,初始化TensorFlow变量时发生OOM错误。我认为这是因为我的实验模型具有更大的操作集(例如3125000个操作)。我制作了自己的模拟器,将状态返回为元组(大约50至100个数字),因此不需要使用卷积层。

错误消息是这样的:

tensorflow/stream_executor/cuda/cuda_driver.cc:967] failed to alloc 17179869184 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
./tensorflow/core/common_runtime/gpu/pool_allocator.h:195] could not allocate pinned host memory of size: 1179869184

我该如何解决这个问题? 请帮助我...

系统环境是具有TITAN X(Pascal)的Ubuntu 16.04

    class DQN:
        def __init__(self, n_features, n_action, lr, dr, max_e_greedy, e_increment, replace_target_iter, memory_size, batch_size):
            self.n_features = n_features
            self.n_actions = n_action
            self.lr = lr
            self.gamma = dr
            self.max_e = max_e_greedy
            self.replace_target_iter = replace_target_iter
            self.memory_size = memory_size
            self.batch_size = batch_size
            self.e = 0
            self.e_increment = e_increment
            self.learn_step = 0
            self.memory = np.zeros((self.memory_size, n_features*2 + 2))
            self.cost_his=[]

            self.build_net()
            t_params = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='target_net')
            e_params = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='eval_net')

            with tf.variable_scope('soft_replacement'):
                self.target_replace_op=[tf.assign(t,e) for t,e in zip(t_params,e_params)]

            self.sess=tf.Session()
            self.sess.run(tf.global_variables_initializer())

        def build_net(self):
            self.s = tf.placeholder(tf.float32, [None, self.n_features], name='s')  # input State
            self.s_ = tf.placeholder(tf.float32, [None, self.n_features], name='s_')  # input Next State
            self.r = tf.placeholder(tf.float32, [None, ], name='r')  # input Reward
            self.a = tf.placeholder(tf.int32, [None, ], name='a')  # input Action

            w_initializer, b_initializer = tf.random_normal_initializer(0., 0.3), tf.constant_initializer(0.1)

            with tf.variable_scope('eval_net'):
                e1 = tf.layers.dense(self.s, 800, tf.nn.relu, kernel_initializer=w_initializer, bias_initializer=b_initializer, name='e1')
                e2 = tf.layers.dense(e1, 800, tf.nn.relu, kernel_initializer=w_initializer, bias_initializer=b_initializer, name='e2')
                #e3 = tf.layers.dense(e2, 800, tf.nn.relu, kernel_initializer=w_initializer, bias_initializer=b_initializer, name='e3')
                #e4 = tf.layers.dense(e3, 800, tf.nn.relu, kernel_initializer=w_initializer, bias_initializer=b_initializer, name='e4')
                self.q_eval=tf.layers.dense(e2, self.n_actions, kernel_initializer=w_initializer, bias_initializer=b_initializer, name='q')

            with tf.variable_scope('target_net'):
                t1 = tf.layers.dense(self.s_, 800, tf.nn.relu, kernel_initializer=w_initializer, bias_initializer=b_initializer, name='t1')
                t2 = tf.layers.dense(t1, 800, tf.nn.relu, kernel_initializer=w_initializer, bias_initializer=b_initializer, name='t2')
                #t3 = tf.layers.dense(t2, 800, tf.nn.relu, kernel_initializer=w_initializer, bias_initializer=b_initializer, name='t3')
                #t4 = tf.layers.dense(t3, 800, tf.nn.relu, kernel_initializer=w_initializer, bias_initializer=b_initializer, name='t4')
                self.q_next=tf.layers.dense(t2, self.n_actions, kernel_initializer=w_initializer, bias_initializer=b_initializer, name='t5')

            with tf.variable_scope('q_target'):
                q_target = self.r + self.gamma * tf.reduce_max(self.q_next, axis=1, name='Qmax_s_')
                self.q_target = tf.stop_gradient(q_target)
            with tf.variable_scope('q_eval'):
                a_indices = tf.stack([tf.range(tf.shape(self.a)[0], dtype=tf.int32), self.a], axis=1)
                self.q_eval_wrt_a = tf.gather_nd(params=self.q_eval, indices=a_indices)    # shape=(None, )
            with tf.variable_scope('loss'):
                self.loss = tf.reduce_mean(tf.squared_difference(self.q_target, self.q_eval_wrt_a, name='TD_error'))
            with tf.variable_scope('train'):
                self._train_op = tf.train.RMSPropOptimizer(self.lr).minimize(self.loss)         

谢谢!

1 个答案:

答案 0 :(得分:0)

Tensorflow尝试将GPU内存的一小部分per_process_gpu_memory_fraction分配给该进程。 (请参见Tensorflow源中的GPUOptions中的注释)。默认设置为95%。如果将该值更改为GPU无法处理的值,则可能会看到CUDA_OUT_OF_MEMORY错误。如果另一个进程正在使用GPU并占用Tensorflow认为会拥有的内存,也会引发此错误。

如果要避免使用CPU,则可以使用allow_growth=True。使用allow_growth = True时,GPU内存不会预先分配,并且可以根据需要增长。