微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

为什么在训练 YOLOv5 模型时出现运行时错误?

如何解决为什么在训练 YOLOv5 模型时出现运行时错误?

回溯(最近一次调用最后一次):文件“train.py”,第 519 行,在 train(hyp,opt,device,tb_writer,wandb) 文件“train.py”,第 300 行,在火车中 scaler.scale(loss).backward() 文件“/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/_tensor.py”, 第 255 行,向后 torch.autograd.backward(self,gradient,retain_graph,create_graph,inputs=inputs)文件 "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch/autograd/init.py",149行,向后 allow_unreachable=True,accumulate_grad=True) # allow_unreachable flag RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_Failed 你可以 尝试使用以下代码片段重现此异常。如果说 不会触发错误,请包含您的原始复制脚本 报告此问题时。

import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([8,64,80,80],dtype=torch.half,device='cuda',requires_grad=True)
net = torch.nn.Conv2d(64,kernel_size=[3,3],padding=[1,1],stride=[1,dilation=[1,groups=1)
net = net.cuda().half()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()

ConvolutionParams 
    data_type = CUDNN_DATA_HALF
    padding = [1,1,0]
    stride = [1,0]
    dilation = [1,0]
    groups = 1
    deterministic = false
    allow_tf32 = true
input: TensorDescriptor 0x7efbbc25f670
    type = CUDNN_DATA_HALF
    nbDims = 4
    dimA = 8,strideA = 409600,6400,output: TensorDescriptor 0x7efbbc27c890
    type = CUDNN_DATA_HALF
    nbDims = 4
    dimA = 8,weight: FilterDescriptor 0x7efbbc2200c0
    type = CUDNN_DATA_HALF
    tensor_format = CUDNN_TENSOR_NCHW
    nbDims = 4
    dimA = 64,3,Pointer addresses: 
    input: 0x7dc4580000
    output: 0x7c6c720000
    weight: 0x7d4674a000
Additional pointer addresses: 
    grad_output: 0x7c6c720000
    grad_weight: 0x7d4674a000
Backward filter algorithm: 5`

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。