如何解决我如何在银行抢劫 atari 游戏中获得奖励?
我有点不明白为什么我的代理人在 Atari 游戏“银行抢劫”中没有获得任何奖励。每次银行抢劫后,当我渲染环境时,我都会监控收到的奖励,但是当我运行下面的代码时,我没有得到任何奖励。 更有趣的是,如果我在其他环境中运行它——比如出租车,我会得到奖励,但当我在银行抢劫时尝试它时,我什么也得不到。
我的代码如下:
import gym
import torch
import matplotlib.pyplot as plt
#env = gym.make('Taxi-v3')
env = gym.make('BankHeist-ram-v0')
plt.style.use('ggplot')
number_of_states = env.observation_space.shape[0]
print(number_of_states)
number_of_actions = env.action_space.n
print(number_of_actions)
#number_of_states = env.observation_space.n
#number_of_actions = env.action_space.n
gamma = 0.9
egreedy = 0.1
Q = torch.zeros([number_of_states,number_of_actions])
print(Q)
num_episodes = 1000
steps_total = []
rewards_total = []
for i_episode in range(num_episodes):
state = env.reset()
step = 0
while True:
step += 1
# env.render()
random_for_egreedy = torch.rand(1)[0]
if random_for_egreedy > egreedy:
random_values = Q[state] + torch.rand(1,number_of_actions) / 1000
action = torch.max(random_values,1)[1][0]
action = action.item()
else:
action = env.action_space.sample()
new_state,reward,done,info = env.step(action)
Q[state,action] = reward + gamma * torch.max(Q[new_state])
state = new_state
if done:
steps_total.append(step)
rewards_total.append(reward)
print("Episode finished after %i steps" % step )
print("Episode achived %i rewards" % reward )
break
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。