微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

梯度计算所需的变量之一已通过就地操作修改:[torch.cuda.FloatTensor [640]] 为第 4 版;

如何解决梯度计算所需的变量之一已通过就地操作修改:[torch.cuda.FloatTensor [640]] 为第 4 版;

我想使用 pytorch distributedDataParallel 进行对抗训练。损失函数Trades。代码可以在DataParallel 模式下运行。但是在 distributedDataParallel 模式下,我收到了这个错误。 当我将loss改为AT时,它可以成功运行。为什么不能在交易亏损的情况下运行?两个损失函数如下:

-- 进程 1 因以下错误而终止:

Traceback (most recent call last):
  File "/home/lthpc/.conda/envs/bba/lib/python3.7/site-packages/torch/multiprocessing/spawn.py",line 19,in _wrap
    fn(i,*args)
  File "/data/zsd/defense/my_adv_training/muti_gpu_2.py",line 170,in main_worker
    train_loss = train(train_loader,model,optimizer,epoch,local_rank,args)
  File "/data/zsd/defense/my_adv_training/muti_gpu_2.py",line 208,in train
    loss = Trades(model,x,y,args.epsilon,args.step_size,args.num_steps,beta=6.0)
  File "/data/zsd/defense/my_adv_training/loss_functions.py",line 137,in Trades
    loss_kl.backward()
  File "/home/lthpc/.conda/envs/bba/lib/python3.7/site-packages/torch/tensor.py",line 221,in backward
    torch.autograd.backward(self,gradient,retain_graph,create_graph)
  File "/home/lthpc/.conda/envs/bba/lib/python3.7/site-packages/torch/autograd/__init__.py",line 132,in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [640]] is at version 4; expected version 3 instead. Hint: enable anomaly detection to find the operation that Failed to compute its gradient,with torch.autograd.set_detect_anomaly(True).  

.

for i,(x,y) in enumerate(train_loader):
    # measure data loading time
    x,y = x.cuda(local_rank,non_blocking=True),y.cuda(local_rank,non_blocking=True)
    
    loss = Trades(model,beta=6.0)  
    torch.distributed.barrier()      
   
    optimizer.zero_grad()
    loss.backward(retain_graph=True)
    optimizer.step()

def Trades():
    model.eval()
    criterion_kl = nn.KLDivLoss(reduction='sum')
    x_adv = x.detach() + 0.001 * torch.randn_like(x).detach()
    nat_output = model(x)
    for _ in range(num_steps):
        x_adv.requires_grad_()
        with torch.enable_grad():
            loss_kl = criterion_kl(F.log_softmax(model(x_adv),dim=1),F.softmax(nat_output,dim=1))
        loss_kl.backward()
        eta = step_size * x_adv.grad.sign()
        x_adv = x_adv.detach() + eta
        x_adv = torch.min(torch.max(x_adv,x - epsilon),x + epsilon)
        x_adv = torch.clamp(x_adv,0.0,1.0)
    model.train()
    x_adv = Variable(x_adv,requires_grad=False)
    optimizer.zero_grad()
    # calculate robust loss
    logits = model(x)
    loss_natural = nn.CrossEntropyLoss()(logits,y)
    loss_robust = (1.0 / x.size(0)) * criterion_kl(F.log_softmax(model(x_adv),F.softmax(logits,dim=1))
    loss = loss_natural + beta * loss_robust
    return loss

def AT():
    model.eval()
    x_adv = x.detach() + torch.from_numpy(np.random.uniform(-epsilon,epsilon,x.shape)).float().cuda()
    x_adv = torch.clamp(x_adv,1.0)
    for k in range(num_steps):
        x_adv.requires_grad_()
        output = model(x_adv)
        model.zero_grad()
        with torch.enable_grad():
            loss = nn.CrossEntropyLoss()(output,y)
        loss.backward()
        eta = step_size * x_adv.grad.sign()
        x_adv = x_adv.detach() + eta
        x_adv = torch.min(torch.max(x_adv,1.0)
    x_adv = Variable(x_adv,requires_grad=False)

    model.train()
    logits_adv = model(x_adv)
    loss = nn.CrossEntropyLoss()(logits_adv,y)
    return loss

解决方法

我更改了交易代码并解决了这个错误。但我不知道为什么会这样。

def trades():
    model.eval()
    criterion_kl = nn.KLDivLoss(reduction='sum')
    x_adv = x.detach() + 0.001 * torch.randn_like(x).detach()
    nat_output = model(x)
    for _ in range(num_steps):
        x_adv.requires_grad_()
        with torch.enable_grad():
            loss_kl = criterion_kl(F.log_softmax(model(x_adv),dim=1),F.softmax(nat_output,dim=1))
        grad = torch.autograd.grad(loss_kl,[x_adv])[0]
        x_adv = x_adv.detach() + step_size * torch.sign(grad.detach())
        x_adv = torch.min(torch.max(x_adv,x - epsilon),x + epsilon)
        x_adv = torch.clamp(x_adv,0.0,1.0)
    model.train()
    x_adv = Variable(x_adv,requires_grad=False)
    optimizer.zero_grad()
    # calculate robust loss
    logits = model(x)
    loss_natural = nn.CrossEntropyLoss()(logits,y)
    loss_robust = (1.0 / x.size(0)) * criterion_kl(F.log_softmax(model(x_adv),F.softmax(logits,dim=1))
    loss = loss_natural + beta * loss_robust
    return loss

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其他元素将获得点击?
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。)
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbcDriver发生异常。为什么?
这是用Java进行XML解析的最佳库。
Java的PriorityQueue的内置迭代器不会以任何特定顺序遍历数据结构。为什么?
如何在Java中聆听按键时移动图像。
Java“Program to an interface”。这是什么意思?