RuntimeError:Cost函数在其第一个输出中返回nan值

如何解决RuntimeError:Cost函数在其第一个输出中返回nan值

我正在研究一个时间模型来预测未来的事件。这是我的colab笔记本的link。我在尝试训练模型时遇到问题。我正在获得NaN值的火车和有效损失。损失函数是由交叉熵损失和平方损失组成的联合损失。链接博客here

尝试了以下无效的解决方案- 学习率较低-0.01、0.001、0.0001

class cost_function():
    def __init__(self,yhat,y,L_2=0.001,logEps=1e-8):
        # logEps : log epsilon,very small positive value greater that 0.0
        # CE = - [ ln(y*)(y) + ln(1-y*)(1-y) ],self.yhat = yhat
        self.y = y
       
        self.logEps = logEps
        self.L_2 = L_2
        
        self.W_out = nn.Parameter(torch.randn(hiddenDimsize,numClass)*0.01)
        
    def cross_entropy(self):
        ce = -(self.y * torch.log(self.yhat + self.logEps) + (1. - self.y) * torch.log(1. - self.yhat + self.logEps))
        print("Inside CrossEntrophy Loss fn : ",ce)
        return ce

    def prediction_loss(self):
        # return  (torch.sum(torch.sum(self.cross_entropy(),dim=0),dim=1)).float()/  lengths.float()
        
        tmp_tensor = torch.sum(self.cross_entropy(),dim=0)
        print("Inside PredictionLoss fn : Sum Dim 0",tmp_tensor)
        print("Inside PredictionLoss fn : Sum Dim 1",torch.sum(tmp_tensor,dim=1))
        print("Inside PredictionLoss fn : Final Result ",(torch.sum(tmp_tensor,dim=1)).float()/  lengths.float())
        return (torch.sum(tmp_tensor,dim=1)).float()/  lengths.float()
        
    def cost(self):
        print("Inside Cost fn :",torch.mean(self.prediction_loss()) + self.L_2 * (self.W_out ** 2).sum())
        return torch.mean(self.prediction_loss()) + self.L_2 * (self.W_out ** 2).sum() # regularize
    

build_EHRNN类-我对前向方法参数进行了修改,以解决未定义的“ h”错误

torch.manual_seed(1)

class build_EHRNN(nn.Module):
    def __init__(self,inputDimsize=4894,hiddenDimsize=[200,200],batchSize=100,embSize=200,numClass=4894,dropout=0.5,logEps=1e-8):
        super(build_EHRNN,self).__init__()
        
        self.inputDimsize = inputDimsize
        self.hiddenDimsize = hiddenDimsize
        self.numClass = numClass
        self.embSize = embSize
        self.batchSize = batchSize
        self.dropout = nn.Dropout(p=0.5)
        self.logEps = logEps
        
        
        # Embedding inputs
        self.W_emb = nn.Parameter(torch.randn(self.inputDimsize,self.embSize).cuda())
        self.b_emb = nn.Parameter(torch.zeros(self.embSize).cuda())
        
        self.W_out = nn.Parameter(torch.randn(self.hiddenDimsize,self.numClass).cuda())
        self.b_out = nn.Parameter(torch.zeros(self.numClass).cuda())
         
        self.params = [self.W_emb,self.W_out,self.b_emb,self.b_out] 
    
    # def forward(self,x,h,lengths,mask):
    def forward(self,mask):
        self.emb = torch.tanh(torch.matmul(x,self.W_emb) + self.b_emb)
        input_values = self.emb
        self.outputs = [input_values]
        for i,hiddenSize in enumerate([self.hiddenDimsize,self.hiddenDimsize]):  # iterate over layers
            rnn = EHRNN(self.inputDimsize,hiddenSize,self.embSize,self.batchSize,self.numClass) # calculate hidden states
            hidden_state = []
            h = self.init_hidden().cuda()
            for i,seq in enumerate(input_values): # loop over sequences in each batch
                h = rnn(seq,h)                    
                hidden_state.append(h)    
            hidden_state = self.dropout(torch.stack(hidden_state))    # apply dropout between layers
            input_values = hidden_state
       
        y_linear = torch.matmul(hidden_state,self.W_out)  + self.b_out # fully connected layer
        yhat = F.softmax(y_linear,dim=1)  # yhat
        yhat = yhat*mask[:,:,None]   # apply mask
        
        # Loss calculation
        cross_entropy = -(y * torch.log(yhat + self.logEps) + (1. - y) * torch.log(1. - yhat + self.logEps))
        last_step = -torch.mean(y[-1] * torch.log(yhat[-1] + self.logEps) + (1. - y[-1]) * torch.log(1. - yhat[-1] + self.logEps))
        prediction_loss = torch.sum(torch.sum(cross_entropy,dim=1)/ torch.cuda.FloatTensor(lengths)
        cost = torch.mean(prediction_loss) + 0.000001 * (self.W_out ** 2).sum() # regularize
        return (yhat,hidden_state,cost)

    def init_hidden(self):
        return torch.zeros(self.batchSize,self.hiddenDimsize)  # initial state

模型训练

artificalData_seqs = np.array(pickle.load(open(os.path.join(GOOGLE_DRV_PATH,BASE_DIR,'data.encodedDxs'),'rb')))
train,test,valid = load_data(artificalData_seqs,artificalData_seqs)

batchSize = 50     # decreased from 100 to 50
n_batches = int(np.ceil(float(len(train[0])) / float(batchSize)))-1
n_batches_valid = int(np.ceil(float(len(valid[0])) / float(batchSize)))-1
model = build_EHRNN(inputDimsize=4894,hiddenDimsize=200,batchSize=50,logEps=1e-8)
model = model.to(device)



 import torch.nn.functional as F
import pdb

optimizer = torch.optim.Adadelta(model.parameters(),lr = 0.001,rho=0.95)
epochs = 10

counter = 0
# with torch.autograd.detect_anomaly():
for e in range(epochs):
    for x,y in train_dl:
        x,mask,lengths = padding(x,inputDimsize,numClass)
        output,h = model(x,mask)
        
        loss = cost_function(output,y).cost()
        # pdb.set_trace()
        loss.backward()
        print("loss ",loss)
        nn.utils.clip_grad_norm_(model.parameters(),5) # Constraining the weight matrix directly == regularization. 
        optimizer.step()
        optimizer.zero_grad()
    
    with torch.no_grad():
            model.eval()
            val_loss = []
            for x_valid,y_valid in valid_dl:
                    x_val,y_val,lengths = padding(x_valid,y_valid,numClass)
                    outputs_val,hidden_val = model(x_val,mask)
                    loss = cost_function(outputs_val,y_val).cost()
                    val_loss.append(loss.item())
            model.train()

            print("Epoch: {}/{}...".format(e+1,epochs),"Step: {}...".format(counter),"Training Loss: {:.4f}...".format(loss.item()),"Val Loss: {:.4f}".format(torch.mean(torch.tensor(val_loss))))

错误(开始让NaN丢失)

Inside PredictionLoss fn : Sum Dim 0 tensor([[0.1008,0.1539,0.1211,...,0.1533,0.1218,0.1418],[0.0253,0.0449,0.0249,0.0439,0.0134,0.0332],[0.0306,0.0799,0.0570,0.0790,0.0484,0.0678],0.0450,[0.0038,0.0106,0.0098,0.0004,0.0106]],grad_fn=<SumBackward1>)
Inside PredictionLoss fn : Sum Dim 1 tensor([  372.4754,133.2620,219.1195,37.5425,141.3354,37.5070,229.2947,0.0000,379.1829,217.3962,80.1226,37.5074,138.4665,82.1034,89.7893,81.8173,92.8159,141.8856,95.9898,216.0511,133.2535,385.0391,369.4958,244.9362,37.5088,37.5087,141.6083,95.3367,735.0569,378.0407,37.5135,40.7778,82.0872,225.9998,216.6189,379.0732,81.4742,144.4226,93.3905,214.0228,37.5078,224.0793,88.3753,41.2919,140.4855,37.5086,226.6366,148.7171,137.9226,13887.5811,81.1428,84.6804,226.6779,37.5065,223.8841,220.5979,83.2484,37.5080,84.5247,384.2115,80.1173,146.9714,37.6982,134.6618,84.1838,37.5421,730.5516,37.5085,215.1523,136.5673,81.2887,94.4181,140.6268,133.9295,136.2485,386.2103,39.0282,37.5055,42.1506,80.1662,228.5819,39.3403,138.7672,1768.6033,143.5350,40.2060,147.7809,380.9214,750.6883,141.0447,136.9028,37.5049],grad_fn=<SumBackward1>)
Inside PredictionLoss fn : lengths tensor([5.,3.,4.,1.,0.,5.,2.,6.,9.,7.,1.])
Inside PredictionLoss fn : Final Result  tensor([  74.4951,44.4207,54.7799,47.1118,57.3237,nan,75.8366,54.3491,40.0613,46.1555,41.0517,44.8946,40.9086,46.4080,47.2952,47.9949,54.0128,44.4178,77.0078,73.8992,61.2340,47.2028,47.6683,122.5095,75.6081,41.0436,56.5000,54.1547,75.8146,40.7371,48.1409,46.6952,53.5057,56.0198,44.1876,46.8285,56.6591,49.5724,45.9742,1543.0646,40.5714,42.3402,56.6695,55.9710,55.1495,41.6242,42.2623,76.8423,40.0586,48.9905,44.8873,42.0919,121.7586,53.7881,45.5224,40.6443,47.2090,46.8756,44.6432,45.4162,40.0587,77.2421,40.0831,57.1455,46.2557,252.6576,47.8450,49.2603,76.1843,125.1147,47.0149,45.6343,grad_fn=<DivBackward0>)
Inside PredictionLoss fn : Sum Dim 0 tensor([[nan,nan],[nan,nan]],grad_fn=<SumBackward1>)
Inside PredictionLoss fn : Sum Dim 1 tensor([nan,grad_fn=<SumBackward1>)
Inside PredictionLoss fn : lengths tensor([2.,3.])
Inside PredictionLoss fn : Final Result  tensor([nan,grad_fn=<SumBackward1>)
Inside PredictionLoss fn : lengths tensor([3.,2.])
Inside PredictionLoss fn : Final Result  tensor([nan,grad_fn=<SumBackward1>)
Inside PredictionLoss fn : lengths tensor([4.,4.])
Inside PredictionLoss fn : Final Result  tensor([nan,grad_fn=<SumBackward1>)
Inside PredictionLoss fn : lengths tensor([1.,6.])
Inside PredictionLoss fn : Final Result  tensor([nan,8.])
Inside PredictionLoss fn : Final Result  tensor([nan,8.,grad_fn=<SumBackward1>)

解决方法

问题是lengths变量中的值。

在您的cost_function.prediction_loss中,交叉熵损失除以每个序列的长度:(torch.sum(tmp_tensor,dim=1)).float()/ lengths.float()
但是,如果您查看lengths张量的值,则:

Inside PredictionLoss fn : lengths tensor([5.,3.,4.,1.,0.,5.,2.,6.,9.,7.,1.])

您会注意到某些条目是 0 (!)。损失函数中的对应值也为零(零长度序列无损失)。当您将零除以零时,您得到nan


一些良好的编码实践:

  1. 如果可能,请使用库函数而不是重新实现。这些功能通常经过测试和优化,并且在数值上更稳定。
    例如,您可以使用torch.nn.CrossEntropyLoss,以数字稳健的方式结合交叉熵损失和softmax。

  2. 用于损失计算的变量lengths显然不是损失函数或类成员的参数。您应该将其设为明确的参数。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其他元素将获得点击?
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。)
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbcDriver发生异常。为什么?
这是用Java进行XML解析的最佳库。
Java的PriorityQueue的内置迭代器不会以任何特定顺序遍历数据结构。为什么?
如何在Java中聆听按键时移动图像。
Java“Program to an interface”。这是什么意思?
Java在半透明框架/面板/组件上重新绘画。
Java“ Class.forName()”和“ Class.forName()。newInstance()”之间有什么区别?
在此环境中不提供编译器。也许是在JRE而不是JDK上运行?
Java用相同的方法在一个类中实现两个接口。哪种接口方法被覆盖?
Java 什么是Runtime.getRuntime()。totalMemory()和freeMemory()?
java.library.path中的java.lang.UnsatisfiedLinkError否*****。dll
JavaFX“位置是必需的。” 即使在同一包装中
Java 导入两个具有相同名称的类。怎么处理?
Java 是否应该在HttpServletResponse.getOutputStream()/。getWriter()上调用.close()?
Java RegEx元字符(。)和普通点?