训练A3C时所有动作的动作值都变为nan

如何解决训练A3C时所有动作的动作值都变为nan

我正在使用pytorch制作具有4个过程的A3C,如下代码所示。

但是令我惊讶的是,在训练A3C动作值时,所有动作都变成了nan。 最初,动作值不是nan。

但是经过一整夜的训练后,它进入了南。 有人可以帮忙让我知道那里是什么问题。

class SharedAdam(torch.optim.Adam):
    def __init__(self,params,lr=1e-3,betas=(0.9,0.99),eps=1e-8,weight_decay=0):
        super(SharedAdam,self).__init__(params,lr=lr,betas=betas,eps=eps,weight_decay=weight_decay)
        # State initialization
        for group in self.param_groups:
            for p in group['params']:
                state = self.state[p]
                state['step'] = 0
                state['exp_avg'] = torch.zeros_like(p.data)
                state['exp_avg_sq'] = torch.zeros_like(p.data)
    
                # share in memory
                state['exp_avg'].share_memory_()
                state['exp_avg_sq'].share_memory_()



class ActorCritic(torch.nn.Module):

    def __init__(self,num_inputs,action_space):
        super(ActorCritic,self).__init__()

        self.num_inputs = num_inputs
        self.action_space = action_space
        self.lstm = nn.LSTMCell(num_inputs,num_inputs)
        num_outputs = action_space
        self.fc1 = nn.Linear(num_inputs,256)
        self.fc1.apply(init_weights)
        self.fc2 = nn.Linear(256,256)
        self.fc2.apply(init_weights)
        self.critic_linear = nn.Linear(256,1)
        self.critic_linear.apply(init_weights)
        self.actor_linear = nn.Linear(256,num_outputs)
        self.actor_linear.apply(init_weights)
        self.lstm.bias_ih.data.fill_(0)
        self.lstm.bias_hh.data.fill_(0)
        self.sig1 = nn.Sigmoid()
        self.train()

    def forward(self,inputs):
        inputs,(hx,cx) = inputs
        hx,cx = self.lstm(inputs,cx))
        x = self.sig1(self.fc1(hx))
        x = torch.tanh(self.fc2(x))
        return self.critic_linear(x),self.actor_linear(x),cx)
    
    def save(self,filename,directory):
        torch.save(self.state_dict(),'%s/%s_actor.pth' % (directory,filename))

    def load(self,directory):
            self.load_state_dict(torch.load('%s/%s_actor.pth' % (directory,filename)))

及以下是培训代码

def train(rank,model,optimizer,data):
    try:
        data = data.dropna()

        count = 0

        data = torch.DoubleTensor(np.asarray(data))

        env = ENV(params.state_dim,params.action_dim,data)
        print("env created\n")
        # init training variables
        max_timesteps = data.shape[0] - 1
        state = env.reset()
        done = True
        episode_length = 0
        count = 0
        while count<max_timesteps-1:
            episode_length += 1
            if done:
                cx = Variable(torch.zeros(1,params.state_dim))
                hx = Variable(torch.zeros(1,params.state_dim))
            else:
                cx = Variable(cx.data)
                hx = Variable(hx.data)

            values = []
            log_probs = []
            rewards = []
            entropies = []
            while count<max_timesteps-1:
                value,action_values,cx) = model((Variable(state.unsqueeze(0)),cx)))
                prob = F.softmax(action_values,dim = -1)
                log_prob = F.log_softmax(action_values,dim=-1).reshape(-1,)
                entropy = -(log_prob * prob).sum(1,keepdim=True)
                entropies.append(entropy)
                
                action = sample(prob)
                
                
                log_prob = log_prob.gather(0,Variable(action))
         
                state,reward,done = env.step(action)
                done = (done or count == max_timesteps-2)
                reward = max(min(reward,1),-1)
                
                count +=1
                
                if done:
                    episode_length = 0
                    state = env.reset()
                    
                
                values.append(value)
                log_probs.append(log_prob)
                rewards.append(reward)
                print(ticker,"rank ",rank," action:",action,"reward ",reward)

                if done:
                    break
                
            R = torch.zeros(1,1)
            if not done:
                value,_,_ = model((Variable(state.unsqueeze(0)),cx)))
                R = value.data
            values.append(Variable(R))
            policy_loss = 0
            value_loss = 0
            R = Variable(R)
            gae = torch.zeros(1,1)
            for i in reversed(range(len(rewards))):
                R = params.gamma * R + rewards[i]
                advantage = R - values[i]
                value_loss = value_loss + 0.5 * advantage.pow(2)
                TD = rewards[i] + params.gamma * values[i + 1].data - values[i].data
                gae = gae * params.gamma * params.tau + TD
                policy_loss = policy_loss - log_probs[i] * Variable(gae) - 0.01 * entropies[i]

            optimizer.zero_grad()
            (policy_loss + 0.5 * value_loss).backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(),40)
            optimizer.step()
            
    except:
        traceback.print_exc()

以下是用于采样动作的代码

def sample(logits):
    noise = torch.rand(logits.shape)
    return torch.argmax(logits - torch.log(-torch.log(noise)),1)

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams[&#39;font.sans-serif&#39;] = [&#39;SimHei&#39;] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -&gt; systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping(&quot;/hires&quot;) public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate&lt;String
使用vite构建项目报错 C:\Users\ychen\work&gt;npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-
参考1 参考2 解决方案 # 点击安装源 协议选择 http:// 路径填写 mirrors.aliyun.com/centos/8.3.2011/BaseOS/x86_64/os URL类型 软件库URL 其他路径 # 版本 7 mirrors.aliyun.com/centos/7/os/x86
报错1 [root@slave1 data_mocker]# kafka-console-consumer.sh --bootstrap-server slave1:9092 --topic topic_db [2023-12-19 18:31:12,770] WARN [Consumer clie
错误1 # 重写数据 hive (edu)&gt; insert overwrite table dwd_trade_cart_add_inc &gt; select data.id, &gt; data.user_id, &gt; data.course_id, &gt; date_format(
错误1 hive (edu)&gt; insert into huanhuan values(1,&#39;haoge&#39;); Query ID = root_20240110071417_fe1517ad-3607-41f4-bdcf-d00b98ac443e Total jobs = 1
报错1:执行到如下就不执行了,没有显示Successfully registered new MBean. [root@slave1 bin]# /usr/local/software/flume-1.9.0/bin/flume-ng agent -n a1 -c /usr/local/softwa
虚拟及没有启动任何服务器查看jps会显示jps,如果没有显示任何东西 [root@slave2 ~]# jps 9647 Jps 解决方案 # 进入/tmp查看 [root@slave1 dfs]# cd /tmp [root@slave1 tmp]# ll 总用量 48 drwxr-xr-x. 2
报错1 hive&gt; show databases; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Error in configuring object Time taken: 0.474 se
报错1 [root@localhost ~]# vim -bash: vim: 未找到命令 安装vim yum -y install vim* # 查看是否安装成功 [root@hadoop01 hadoop]# rpm -qa |grep vim vim-X11-7.4.629-8.el7_9.x
修改hadoop配置 vi /usr/local/software/hadoop-2.9.2/etc/hadoop/yarn-site.xml # 添加如下 &lt;configuration&gt; &lt;property&gt; &lt;name&gt;yarn.nodemanager.res