Q学习,所有情节的奖励即将到来0

时间:2020-10-01 16:37:57

标签: python pip reinforcement-learning openai-gym q-learning

冬天在这里。当您进行疯狂投掷时,您和您的朋友们在公园的飞盘周围抛掷,使飞盘离开湖中。水大部分都被冻结了,但是冰融化了一些孔。如果您进入其中一个洞,您将掉入冰冷的水中。目前,国际飞盘短缺,因此绝对需要在湖上航行并取回光盘。但是,冰很滑,因此您将不会总是朝着想要的方向移动。

SFFF(S:起点,安全) FHFH(F:冻结表面,安全) FFFH(H:洞,跌入你的厄运) HFFG(G:飞盘所在的球门)

我正在执行Q学习算法。关于FrozenLake8x8-v0问题。我在每一集中获得的奖励都是零。 可能是什么原因呢? http://localhost:8888/lab/tree/alok%2FUntitled3.ipynb

num_episodes = 5000
max_steps_per_episode =  200

learning_rate = 0.2    # notation - η or α
discount_rate = 0.9    #notation - γ(gamma)

exploration_rate = 1   #notation - ε
max_exploration_rate = 1
min_exploration_rate = 0.1
exploration_decay_rate = 0.01

rewards_all_episodes = []  
episode_steps = []

#Q - Learning algo
for episode in range(num_episodes):
  state = env.reset()

  done = False
  rewards_current_episode = 0

  for step in range(max_steps_per_episode):
    
    #Exploration - exploitation trade- off
    exploration_rate_threshold = random.uniform(0, 1)   
    if exploration_rate_threshold > exploration_rate:
        action = np.argmax(q_table[state, :])
    else:
        action = env.action_space.sample()
        
    new_state, reward, done, info = env.step(action)  #tuple unpacking
    
    
    #Updating Q-table for Q(s,a)
    q_table[state, action] = q_table[state, action] * (1 - learning_rate) + \
        learning_rate * (reward + discount_rate * np.max(q_table[new_state, :]))
    
    
    state = new_state               # change state to new_state
    rewards_current_episode += reward
    
    if done == True:
        break
        
        
   #Exploration rate decay
   exploration_rate = min_exploration_rate + \
    (max_exploration_rate - min_exploration_rate) * np.exp(-exploration_decay_rate*episode)
    
    rewards_all_episodes.append(rewards_current_episode)
    episode_steps.append(step)     # this is important step 
    
#calculating & printing the average reward per thousand episodes

rewards_per_50_episodes =np.split(np.array(rewards_all_episodes), num_episodes/50)    
count = 50
print("********Average rewards per 50 episodes********\n")
for r in rewards_per_50_episodes:
  print(count, ": ", str(sum(r/50)))                                                  
  count += 50
        
 #print updated Q-table
 print("\n\n*******Q-table********\n")
 print(q_table)

0 个答案:

没有答案