如何解决Q学习,所有情节的奖励即将到来0
冬天在这里。当您进行疯狂投掷时,您和您的朋友们在公园的飞盘周围抛掷,使飞盘离开湖中。水大部分都被冻结了,但是冰融化了一些孔。如果您进入其中一个洞,您将掉入冰冷的水中。目前,国际飞盘短缺,因此绝对需要在湖上航行并取回光盘。但是,冰很滑,因此您将不会总是朝着想要的方向移动。
SFFF(S:起点,安全) FHFH(F:冻结表面,安全) FFFH(H:洞,跌入你的厄运) HFFG(G:飞盘所在的球门)
我正在执行Q学习算法。关于FrozenLake8x8-v0问题。我在每一集中获得的奖励都是零。 可能是什么原因呢? http://localhost:8888/lab/tree/alok%2FUntitled3.ipynb
num_episodes = 5000
max_steps_per_episode = 200
learning_rate = 0.2 # notation - η or α
discount_rate = 0.9 #notation - γ(gamma)
exploration_rate = 1 #notation - ε
max_exploration_rate = 1
min_exploration_rate = 0.1
exploration_decay_rate = 0.01
rewards_all_episodes = []
episode_steps = []
#Q - Learning algo
for episode in range(num_episodes):
state = env.reset()
done = False
rewards_current_episode = 0
for step in range(max_steps_per_episode):
#Exploration - exploitation Trade- off
exploration_rate_threshold = random.uniform(0,1)
if exploration_rate_threshold > exploration_rate:
action = np.argmax(q_table[state,:])
else:
action = env.action_space.sample()
new_state,reward,done,info = env.step(action) #tuple unpacking
#Updating Q-table for Q(s,a)
q_table[state,action] = q_table[state,action] * (1 - learning_rate) + \
learning_rate * (reward + discount_rate * np.max(q_table[new_state,:]))
state = new_state # change state to new_state
rewards_current_episode += reward
if done == True:
break
#Exploration rate decay
exploration_rate = min_exploration_rate + \
(max_exploration_rate - min_exploration_rate) * np.exp(-exploration_decay_rate*episode)
rewards_all_episodes.append(rewards_current_episode)
episode_steps.append(step) # this is important step
#calculating & printing the average reward per thousand episodes
rewards_per_50_episodes =np.split(np.array(rewards_all_episodes),num_episodes/50)
count = 50
print("********Average rewards per 50 episodes********\n")
for r in rewards_per_50_episodes:
print(count,": ",str(sum(r/50)))
count += 50
#print updated Q-table
print("\n\n*******Q-table********\n")
print(q_table)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。