Question

感谢您阅读我的问题。

我正在使用keras来开发基于keres-rl的强化学习代理。但是我想升级我的代理，以便从开放的AI基线代码中获取一些更新，以便更好地进行操作探索。但是代码仅使用了tensorflow。这是我第一次使用tensorflow。我感到很困惑。我使用其“模型API”构建keras深度学习模型。我从不关心模型的内部。但是我引用的代码充满了深度学习模型内部的代码，并对其权重进行了一些更改，并使用tf.Session（）立即获得了图层输出。该框架是如此灵活。像下面一样，使用tf.Session（）可以识别张量且不可调用的张量可以得到结果，从而馈送feed_dict数据。据我所知，在喀拉拉邦，这是不可能的。

一旦允许使用tf.Session（），我的体系结构将会很复杂，除了我可以更轻松地修改参考代码之外，没有人愿意理解和使用它。

另一方面，如果我不允许这样做，则需要分解现有模型，并使用大量的K.function来获取中间层的输出或无法从keras模型中获得的东西。

import numpy as np
from keras.layers import Dense, Input, BatchNormalization
from keras.models import Model
import tensorflow as tf
import keras.backend as K
import rl2.tf_util as U


def normalize(x, stats):
    if stats is None:
        return x
    return (x - stats.mean) / (stats.std + 1e-8)

class RunningMeanStd(object):
    # https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
    def __init__(self, my, epsilon=1e-2, shape=()):

        self._sum = K.variable(value=np.zeros(shape), dtype=tf.float32, name=my+"_runningsum")
        self._sumsq = K.variable(value=np.zeros(shape) + epsilon, dtype=tf.float32, name=my+"_runningsumsq")
        self._count = K.variable(value=np.zeros(()) + epsilon, dtype=tf.float32, name=my+"_count")

        self.mean = self._sum / self._count
        self.std = K.sqrt(K.maximum((self._sumsq / self._count) - K.square(self.mean), epsilon))

        newsum = K.variable(value=np.zeros(shape), dtype=tf.float32, name=my+'_sum')
        newsumsq = K.variable(value=np.zeros(shape), dtype=tf.float32, name=my+'_var')
        newcount = K.variable(value=np.zeros(()), dtype=tf.float32, name=my+'_count')

        self.incfiltparams = K.function([newsum, newsumsq, newcount], [],
            updates=[K.update_add(self._sum, newsum),
                     K.update(self._sumsq, newsumsq),
                     K.update(self._count, newcount)])

    def update(self, x):
        x = x.astype('float64')
        n = int(np.prod(self.shape))
        totalvec = np.zeros(n*2+1, 'float64')
        addvec = np.concatenate([x.sum(axis=0).ravel(), np.square(x).sum(axis=0).ravel(), np.array([len(x)],dtype='float64')])
        self.incfiltparams(totalvec[0:n].reshape(self.shape),
                           totalvec[n:2*n].reshape(self.shape),
                           totalvec[2*n])

i = Input(shape=(1,))
# h = BatchNormalization()(i)
h = Dense(4, activation='relu',  kernel_initializer='he_uniform')(i)
h = Dense(10, activation='relu', kernel_initializer='he_uniform')(h)
o = Dense(1, activation='linear', kernel_initializer='he_uniform')(h)

model = Model(i, o)

obs_rms = RunningMeanStd(my='obs', shape=(1,))
normalized_obs0 = K.clip(normalize(i, obs_rms), 0, 100)

tf2 = model(normalized_obs0)

# print(model.predict(np.asarray([2,2,2,2,2]).reshape(5,)))
# print(tf(np.asarray([2,2,2,2,2]).reshape(5,)))

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run([tf2], feed_dict={i : U.adjust_shape(i, [np.asarray([2,]).reshape(1,)])}))

我可以在仅使用keras的环境中使用tf.session（）吗？

0 个答案: