Question

我不知道为什么此代码无法正常工作。当我将奖励放入列表时，出现错误告诉我尺寸不正确。我不确定该怎么办。

我正在实施加强型深层q网络。 r是一个numpy 2d数组，给出1除以停止点之间的距离。这样一来，越近的停靠站就会获得越高的奖励。

无论我做什么，我都无法获得平稳运行的奖励。我是Tensorflow的新手，所以这可能是由于我对Tensorflow占位符和feed dict等内容缺乏经验造成的。

预先感谢您的帮助。

observations = tf.placeholder('float32', shape=[None, num_stops])

game states : r[stop], r[next_stop], r[third_stop]

actions = tf.placeholder('int32',shape=[None]) 

rewards = tf.placeholder('float32',shape=[None])  # +1, -1 with discounts

Y = tf.layers.dense(observations, 200, activation=tf.nn.relu)
Ylogits = tf.layers.dense(Y, num_stops)

sample_op = tf.random.categorical(logits=Ylogits, num_samples=1)

cross_entropies = tf.losses.softmax_cross_entropy(onehot_labels=tf.one_hot  (actions,num_stops), logits=Ylogits)

loss = tf.reduce_sum(rewards * cross_entropies)


optimizer = tf.train.RMSPropOptimizer(learning_rate=0.001, decay=.99)
train_op = optimizer.minimize(loss)




visited_stops = []
steps = 0

with tf.Session() as sess:

    sess.run(tf.global_variables_initializer())

    # Start at a random stop, initialize done to false
    current_stop = random.randint(0, len(r) - 1)
    done = False

    # reset everything    
    while not done: # play a game in x steps   

        observations_list = []
        actions_list = []
        rewards_list = []

        # List all stops and their scores
        observation = r[current_stop]

        # Add the stop to a list of non-visited stops if it isn't
        # already there
        if current_stop not in visited_stops:
            visited_stops.append(current_stop)

        # decide where to go
        action = sess.run(sample_op, feed_dict={observations: [observation]})

        # play it, output next state, reward if we got a point, and whether the game is over
        #game_state, reward, done, info = pong_sim.step(action)
        new_stop = int(action)


        reward = r[current_stop][action]

        if len(visited_stops) == num_stops:
            done = True

        if steps >= BATCH_SIZE:
            done = True

        steps += 1

        observations_list.append(observation)
        actions_list.append(action)
        rewards.append(reward)



        #rewards_list = np.reshape(rewards, [-1, 25])
        current_stop = new_stop

    #processed_rewards = discount_rewards(rewards, args.gamma)
    #processed_rewards = normalize_rewards(rewards, args.gamma)

    print(rewards)
    sess.run(train_op, feed_dict={observations: [observations_list],
                             actions: [actions_list],
                             rewards: [rewards_list]})

Answer 1

行rewards.append(reward)会导致错误，这是因为您的rewards变量是张量，正如您在rewards = tf.placeholder('float32',shape=[None])中定义的那样，并且您不能像这样将值附加到张量中。您可能想致电rewards_list.append(reward)。

此外，您正在初始化变量

observations_list = []
actions_list = []
rewards_list = []

在循环内部，因此在每次迭代中，ols值将被空列表覆盖。您可能希望在while not done:行之前有这3行。

AttributeError：'Tensor'对象没有属性'append'

1 个答案: