如何解决PyTorchLightning 返回 CUDA 错误:打印以某种方式解决了这个问题为什么?
我正在使用 PyTorch Lightning 编写一个简单的训练器,但是当我尝试运行训练器时,由于某种原因,10 次中有 9 次返回“CUDA 错误:设备端断言”。简单地打印一个换行符之前它似乎以某种方式使它工作。有什么想法吗?
我的代码:
class Elementwise(nn.ModuleList):
"""
A simple network container.
Parameters are a list of modules.
Inputs are a 3d Tensor whose last dimension is the same length
as the list.
Outputs are the result of applying modules to inputs elementwise.
An optional merge parameter allows the outputs to be reduced to a
single Tensor.
"""
def __init__(self,merge=None,*args):
assert merge in [None,'first','concat','sum','mlp']
self.merge = merge
super(Elementwise,self).__init__(*args)
def forward(self,inputs):
inputs_ = [feat.squeeze(1) for feat in inputs.split(1,dim=1)]
for i,j in enumerate(inputs_):
inp = torch.tensor(j).to(device).long()
inputs_[i] = inp
# this does not work
outputs = [f(x) for i,(f,x) in enumerate(zip(self,inputs_))]
if self.merge == 'first':
return outputs[0]
elif self.merge == 'concat' or self.merge == 'mlp':
return torch.cat(outputs,1)
elif self.merge == 'sum':
return sum(outputs)
else:
return outputs
但不知何故这很神奇:
class Elementwise(nn.ModuleList):
"""
A simple network container.
Parameters are a list of modules.
Inputs are a 3d Tensor whose last dimension is the same length
as the list.
Outputs are the result of applying modules to inputs elementwise.
An optional merge parameter allows the outputs to be reduced to a
single Tensor.
"""
def __init__(self,j in enumerate(inputs_):
inp = torch.tensor(j).to(device).long()
inputs_[i] = inp
print("")
outputs = [f(x) for i,1)
elif self.merge == 'sum':
return sum(outputs)
else:
return outputs
知道如何通过简单地打印到输出来修复此错误吗?
编辑:此错误仅在使用 PyTorch Lightning 进行抽象训练时出现,使用普通 PyTorch 使其工作正常。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。