Q学习，所有情节的奖励即将到来0

如何解决Q学习，所有情节的奖励即将到来0

冬天在这里。当您进行疯狂投掷时，您和您的朋友们在公园的飞盘周围抛掷，使飞盘离开湖中。水大部分都被冻结了，但是冰融化了一些孔。如果您进入其中一个洞，您将掉入冰冷的水中。目前，国际飞盘短缺，因此绝对需要在湖上航行并取回光盘。但是，冰很滑，因此您将不会总是朝着想要的方向移动。

SFFF（S：起点，安全） FHFH（F：冻结表面，安全） FFFH（H：洞，跌入你的厄运） HFFG（G：飞盘所在的球门）

我正在执行Q学习算法。关于FrozenLake8x8-v0问题。我在每一集中获得的奖励都是零。可能是什么原因呢？ http://localhost:8888/lab/tree/alok%2FUntitled3.ipynb

num_episodes = 5000
max_steps_per_episode =  200

learning_rate = 0.2    # notation - η or α
discount_rate = 0.9    #notation - γ(gamma)

exploration_rate = 1   #notation - ε
max_exploration_rate = 1
min_exploration_rate = 0.1
exploration_decay_rate = 0.01

rewards_all_episodes = []  
episode_steps = []

#Q - Learning algo
for episode in range(num_episodes):
  state = env.reset()

  done = False
  rewards_current_episode = 0

  for step in range(max_steps_per_episode):
    
    #Exploration - exploitation Trade- off
    exploration_rate_threshold = random.uniform(0,1)   
    if exploration_rate_threshold > exploration_rate:
        action = np.argmax(q_table[state,:])
    else:
        action = env.action_space.sample()
        
    new_state,reward,done,info = env.step(action)  #tuple unpacking
    
    
    #Updating Q-table for Q(s,a)
    q_table[state,action] = q_table[state,action] * (1 - learning_rate) + \
        learning_rate * (reward + discount_rate * np.max(q_table[new_state,:]))
    
    
    state = new_state               # change state to new_state
    rewards_current_episode += reward
    
    if done == True:
        break
        
        
   #Exploration rate decay
   exploration_rate = min_exploration_rate + \
    (max_exploration_rate - min_exploration_rate) * np.exp(-exploration_decay_rate*episode)
    
    rewards_all_episodes.append(rewards_current_episode)
    episode_steps.append(step)     # this is important step 
    
#calculating & printing the average reward per thousand episodes

rewards_per_50_episodes =np.split(np.array(rewards_all_episodes),num_episodes/50)    
count = 50
print("********Average rewards per 50 episodes********\n")
for r in rewards_per_50_episodes:
  print(count,": ",str(sum(r/50)))                                                  
  count += 50
        
 #print updated Q-table
 print("\n\n*******Q-table********\n")
 print(q_table)