微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

我如何在银行抢劫 atari 游戏中获得奖励?

如何解决我如何在银行抢劫 atari 游戏中获得奖励?

我有点不明白为什么我的代理人在 Atari 游戏“银行抢劫”中没有获得任何奖励。每次银行抢劫后,当我渲染环境时,我都会监控收到的奖励,但是当我运行下面的代码时,我没有得到任何奖励。 更有趣的是,如果我在其他环境中运行它——比如出租车,我会得到奖励,但当我在银行抢劫时尝试它时,我什么也得不到。

我的代码如下:

import gym
import torch

import matplotlib.pyplot as plt


#env = gym.make('Taxi-v3')
env = gym.make('BankHeist-ram-v0')

plt.style.use('ggplot')

number_of_states = env.observation_space.shape[0]
print(number_of_states)
number_of_actions = env.action_space.n
print(number_of_actions)
#number_of_states = env.observation_space.n
#number_of_actions = env.action_space.n





gamma = 0.9

egreedy = 0.1

Q = torch.zeros([number_of_states,number_of_actions])
print(Q)
num_episodes = 1000

steps_total = []
rewards_total = []

for i_episode in range(num_episodes):
    
    state = env.reset()
    step = 0

    while True:
    
    step += 1
    
   # env.render()
    
    random_for_egreedy = torch.rand(1)[0]
    
    if random_for_egreedy > egreedy:      
        random_values = Q[state] + torch.rand(1,number_of_actions) / 1000      
        action = torch.max(random_values,1)[1][0]  
        action = action.item()
    else:
        action = env.action_space.sample()
    
    new_state,reward,done,info = env.step(action)

    Q[state,action] = reward + gamma * torch.max(Q[new_state])
    
    state = new_state

    if done:
        steps_total.append(step)
        rewards_total.append(reward)
        print("Episode finished after %i steps" % step )
        print("Episode achived %i rewards" % reward )
        break

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。