Question

我想通过使用一段很久以前写的代码来训练带有Tensorflow和keras的Q学习的TensorFlow模型。但是在执行它时，我收到此错误：-

使用TensorFlow后端。追溯（最近一次通话）：文件“ C：\ Users \ neelg \ TF2_GPU \ lib \ site-packages \ tensorflow_core \ python \ eager \ execute.py”，第206行，位于make_shape中形状= tensor_shape.as_shape（v）文件“ C：\ Users \ neelg \ TF2_GPU \ lib \ site-packages \ tensorflow_core \ python \ framework \ tensor_shape.py”， 1216行，格式为as_shape 返回TensorShape（形状）文件“ C：\ Users \ neelg \ TF2_GPU \ lib \ site-packages \ tensorflow_core \ python \ framework \ tensor_shape.py”，第776行，在 init 中 self._dims = [dims_iter中d的as_dimension（d）] 文件“ C：\ Users \ neelg \ TF2_GPU \ lib \ site-packages \ tensorflow_core \ python \ framework \ tensor_shape.py”， 776行，在 self._dims = [dims_iter中d的as_dimension（d）] 文件“ C：\ Users \ neelg \ TF2_GPU \ lib \ site-packages \ tensorflow_core \ python \ framework \ tensor_shape.py”，第718行，按as_dimension 返回维度（值）文件“ C：\ Users \ neelg \ TF2_GPU \ lib \ site-packages \ tensorflow_core \ python \ framework \ tensor_shape.py”，第193行， init self._value = int（值） TypeError：int（）参数必须是字符串，类似字节的对象或数字，而不是'TimeLimit'

During handling of the above exception, another exception occurred:

回溯（最近通话最近）：文件“ C：\ Users \ neelg \ Documents \ Atom_projects \ Main \ test.py”，第66行，在代理= DQNAgent（env，450） init 中的文件“ C：\ Users \ neelg \ Documents \ Atom_projects \ Main \ test.py”，第25行 self.model = self._build_model（） _build_model中的文件“ C：\ Users \ neelg \ Documents \ Atom_projects \ Main \ test.py”，第31行 model.add（密集（24，input_dim = self.state_size，激活='relu'））文件“ C：\ Users \ neelg \ TF2_GPU \ lib \ site-packages \ keras \ engine \ sequential.py”，第162行，添加 name = layer.name +'_input'）文件“ C：\ Users \ neelg \ TF2_GPU \ lib \ site-packages \ keras \ engine \ input_layer.py”，输入中的第178行 input_tensor =张量）文件“ C：\ Users \ neelg \ TF2_GPU \ lib \ site-packages \ keras \ legacy \ interfaces.py”，第91行，在包装器中 return func（* args，** kwargs）文件“ C：\ Users \ neelg \ TF2_GPU \ lib \ site-packages \ keras \ engine \ input_layer.py”，第87行，初始化 名称= self.name）文件“ C：\ Users \ neelg \ TF2_GPU \ lib \ site-packages \ keras \ backend \ tensorflow_backend.py”，第75行，在symbolic_fn_wrapper中 return func（* args，** kwargs）文件“ C：\ Users \ neelg \ TF2_GPU \ lib \ site-packages \ keras \ backend \ tensorflow_backend.py”，第736行，在占位符中 shape =形状，ndim = ndim，dtype = dtype，稀疏=稀疏，name = name）文件“ C：\ Users \ neelg \ TF2_GPU \ lib \ site-packages \ tensorflow_core \ python \ keras \ backend.py”，行1057，在占位符中 x = array_ops.placeholder（dtype，shape = shape，name = name）文件“ C：\ Users \ neelg \ TF2_GPU \ lib \ site-packages \ tensorflow_core \ python \ ops \ array_ops.py”，第2630行，在占位符中返回gen_array_ops.placeholder（dtype = dtype，shape = shape，name = name）文件“ C：\ Users \ neelg \ TF2_GPU \ lib \ site-packages \ tensorflow_core \ python \ ops \ gen_array_ops.py”，行6669，在占位符中形状= _execute.make_shape（形状，“形状”）文件“ C：\ Users \ neelg \ TF2_GPU \ lib \ site-packages \ tensorflow_core \ python \ eager \ execute.py”，第208行，在make_shape中引发TypeError（“将％s转换为TensorShape时出错：％s。”％（arg_name，e）） TypeError：将形状转换为TensorShape时出错：int（）参数必须是字符串，类似字节的对象或数字，而不是'TimeLimit'。

我看不到“ TimeLimit”的任何用法，因此不确定发生了什么。这个错误对我造成了严重的延误。我正在使用Tensorflow 2.0-gpu，但在正常Tensorflow（stock）的版本上也遇到了相同的错误。任何帮助将不胜感激：）

这是我的代码：===>

"""
Created on Mon Oct 14 22:01:50 2019

@author: neel
"""

import numpy as np
import tensorflow as tf
from tensorflow import keras
from keras.models import Sequential
from keras.layers.core import Dense
from collections import deque
import gym

class DQNAgent:
    def __init__(self, state_size, action_size):
        self.state_size = state_size
        self.action_size = action_size
        self.memory = deque(maxlen=2000)
        self.gamma = 0.95    # discount rate
        self.epsilon = 1.0  # exploration rate
        self.epsilon_min = 0.01
        self.epsilon_decay = 0.995
        self.learning_rate = 0.001
        self.model = self._build_model()


    def _build_model(self):
        # Neural Net for Deep-Q learning Model
        model = Sequential()
        model.add(Dense(24, input_dim=self.state_size, activation='relu'))
        model.add(Dense(24, activation='relu'))
        model.add(Dense(self.action_size, activation='linear'))
        model.compile(loss='mse',
                      optimizer=Adam(lr=self.learning_rate))
        return model


    def remember(self, state, action, reward, next_state, done):
        self.memory.append((state, action, reward, next_state, done))


    def act(self, state):
        if np.random.rand() <= self.epsilon:
            return random.randrange(self.action_size)
        act_values = self.model.predict(state)
        return np.argmax(act_values[0])  # returns action


    def replay(self, batch_size):
        minibatch = random.sample(self.memory, batch_size)
        for state, action, reward, next_state, done in minibatch:
            target = reward
            if not done:
              target = reward + self.gamma * \
                       np.amax(self.model.predict(next_state)[0])
            target_f = self.model.predict(state)
            target_f[0][action] = target
            self.model.fit(state, target_f, epochs=1, verbose=0)
        if self.epsilon > self.epsilon_min:
            self.epsilon *= self.epsilon_decay

if __name__ == "__main__":
    # initialize gym environment and the agent
    env = gym.make('CartPole-v0')
    agent = DQNAgent(env, 450)
    env.render()
    # Iterate the game

    for e in range(episodes):
        # reset state in the beginning of each game
        state = env.reset()
        state = np.reshape(state, [1, 4])
        # time_t represents each frame of the game
        # Our goal is to keep the pole upright as long as possible until score of 500
        # the more time_t the more score

        for time_t in range(500):
            # turn this on if you want to render
            env.render()
            # Decide action
            action = agent.act(state)
            # Advance the game to the next frame based on the action.
            # Reward is 1 for every frame the pole survived
            next_state, reward, done, _ = env.step(action)
            next_state = np.reshape(next_state, [1, 4])
            # Remember the previous state, action, reward, and done
            agent.remember(state, action, reward, next_state, done)
            # make next_state the new current state for the next frame.
            state = next_state
            # done becomes True when the game ends
            # ex) The agent drops the pole


            if done:
                # print the score and break out of the loop
                print("episode: {}/{}, score: {}"
                      .format(e, episodes, time_t))
                break
        # train the agent with the experience of the episode
        agent.replay(32)

BTW由于我将CUDA与Nvidia卡一起使用，因此我认为这是产生长时间错误的原因

TypeError：将形状转换为TensorShape

0 个答案: