Tensorflow/CUDA 中卷积算法之间的结果不匹配

如何解决Tensorflow/CUDA 中卷积算法之间的结果不匹配

我正在训练一个卷积自编码器并注意到这个警告：

Tensorflow: 2.5-gpu from pip
Driver: 460.80
cuda: 11.2.2
cudnn: 8.1.1
XLA: Yes
Mixed precision: Yes

26/27 [===========================>..] - ETA: 0s - loss: 1.0554 - pre_dense_out_loss: 0.9997 - de_conv1dtranspose_out_loss: 0.55782021-06-05 21:28:17.678118: E tensorflow/compiler/xla/service/gpu/buffer_comparator.cc:682] Difference at 0: 95.25 vs 80.8125
2021-06-05 21:28:17.678132: E tensorflow/compiler/xla/service/gpu/buffer_comparator.cc:682] Difference at 1: 95.6875 vs 81
2021-06-05 21:28:17.678136: E tensorflow/compiler/xla/service/gpu/buffer_comparator.cc:682] Difference at 2: 95.4375 vs 82.125
2021-06-05 21:28:17.678139: E tensorflow/compiler/xla/service/gpu/buffer_comparator.cc:682] Difference at 3: 95.3125 vs 80.5625
2021-06-05 21:28:17.678141: E tensorflow/compiler/xla/service/gpu/buffer_comparator.cc:682] Difference at 4: 95.375 vs 81.3125
2021-06-05 21:28:17.678145: E tensorflow/compiler/xla/service/gpu/buffer_comparator.cc:682] Difference at 5: 94.9375 vs 79.8125
2021-06-05 21:28:17.678148: E tensorflow/compiler/xla/service/gpu/buffer_comparator.cc:682] Difference at 6: 95.3125 vs 81
2021-06-05 21:28:17.678151: E tensorflow/compiler/xla/service/gpu/buffer_comparator.cc:682] Difference at 7: 95.625 vs 82
2021-06-05 21:28:17.678153: E tensorflow/compiler/xla/service/gpu/buffer_comparator.cc:682] Difference at 8: 94.75 vs 78.5625
2021-06-05 21:28:17.678156: E tensorflow/compiler/xla/service/gpu/buffer_comparator.cc:682] Difference at 9: 95.25 vs 80.25
2021-06-05 21:28:17.678170: E tensorflow/compiler/xla/service/gpu/gpu_conv_algorithm_picker.cc:545] Results mismatch between different convolution algorithms. This is likely a bug/unexpected loss of precision in cudnn.
%custom-call.20 = (f16[1,5,24,24]{2,1,3},u8[0]{0}) custom-call(f16[3778,50,24]{3,2,0} %bitcast.237,f16[3778,10,0} %arg45.46),window={size=1x5 stride=1x5},dim_labels=b01f_01io->b01f,custom_call_target="__cudnn$convBackwardFilter",Metadata={op_type="Conv2DBackpropFilter" op_name="gradient_tape/model/de_conv1dtranspose_2/conv1d_transpose/Conv2DBackpropFilter"},backend_config="{\"algorithm\":\"0\",\"tensor_ops_enabled\":false,\"conv_result_scale\":1,\"activation_mode\":\"0\",\"side_input_scale\":0}" for 1+TC vs 0+TC
2021-06-05 21:28:17.678174: E tensorflow/compiler/xla/service/gpu/gpu_conv_algorithm_picker.cc:192] Device: GeForce RTX 3070
2021-06-05 21:28:17.678177: E tensorflow/compiler/xla/service/gpu/gpu_conv_algorithm_picker.cc:193] Platform: Compute Capability 8.6
2021-06-05 21:28:17.678180: E tensorflow/compiler/xla/service/gpu/gpu_conv_algorithm_picker.cc:194] Driver: 11020 (460.80.0)
2021-06-05 21:28:17.678182: E tensorflow/compiler/xla/service/gpu/gpu_conv_algorithm_picker.cc:195] Runtime: <undefined>
2021-06-05 21:28:17.678185: E tensorflow/compiler/xla/service/gpu/gpu_conv_algorithm_picker.cc:202] cudnn version: 8.1.1

这是在 Ubuntu 20.04 上的全新构建。我之前在 Windows 中运行 RTX 2060 时没有注意到这个警告。输入数据有点大，所以 MRE 可能很困难。有谁知道这个警告是关于什么的？

解决方法

这可能是低精度（例如 FP16）数据类型累加的影响。

您使用哪种数据类型？以及哪些算法？

来自：https://docs.nvidia.com/deeplearning/cudnn/developer-guide/index.html

混合精度数值精度

当计算精度和输出精度不同时，数值精度可能会因一种算法而异。

例如，当计算在 FP32 中执行并且输出在 FP16 中时，与 CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1 (ALGO_1) 相比，CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0 (ALGO_0) 的精度较低。这是因为 ALGO_0 没有使用额外的工作空间，并且被迫在 FP16 中累积中间结果，即半精度浮点数，这降低了准确性。另一方面，ALGO_1 使用额外的工作空间来累加 FP32 中的中间值，即全精度浮点数。