如何解决OpenACC Fortran 循环中的顺序 dot_product
在 Fortran 程序中,我有一个大循环,其中对循环内生成的小向量进行了多次 dot_product
调用:
program test
implicit none
real :: array1(2,2),array2(2,res(2)
real :: subarray1(2),subarray2(2)
integer :: i
array1 = 1
array2 = 2
!$acc data copyin(array1,array2) copyout(res)
!$acc kernels
!$acc loop independent private(subarray1,subarray2)
do i = 1,2
subarray1(:) = array1(:,i)
subarray2(:) = array2(:,i)
res(i) = dot_product(subarray1,subarray2)
enddo
!$acc end kernels
!$acc end data
print "(2(g0,x))",res
endprogram
当使用 PGI 编译器编译时,dot_product
的加速实现似乎使用加速循环,因此阻止更好地加速主循环(在 gang 和 vector 上):
test:
11,Generating copyin(array1(:,:)) [if not already present]
Generating copyout(res(:)) [if not already present]
Generating copyin(array2(:,:)) [if not already present]
14,Loop is parallelizable
Generating Tesla code
14,!$acc loop gang ! blockidx%x
15,!$acc loop vector(32) ! threadidx%x
17,!$acc loop vector(32) ! threadidx%x
Generating implicit reduction(+:subarray1$r)
14,CUDA shared memory used for subarray2,subarray1
15,Loop is parallelizable
17,Loop is parallelizable
从日志中可以看出,它对循环私有向量使用隐式归约和共享内存。
有没有办法强制 dot_product
按顺序运行?
解决方法
有没有办法强制 dot_product 按顺序运行?
只要您不介意数组语法也按顺序运行,只需在循环指令中添加“gang vector”即可。
% cat test.f90
program test
implicit none
real :: array1(2,2),array2(2,res(2)
real :: subarray1(2),subarray2(2)
integer :: i
array1 = 1
array2 = 2
!$acc data copyin(array1,array2) copyout(res)
!$acc kernels loop gang vector private(subarray1,subarray2)
do i = 1,2
subarray1(:) = array1(:,i)
subarray2(:) = array2(:,i)
res(i) = dot_product(subarray1,subarray2)
enddo
!$acc end data
print "(2(g0,x))",res
endprogram
% nvfortran -acc -Minfo=accel test.f90
test:
11,Generating copyin(array1(:,:)) [if not already present]
Generating copyout(res(:)) [if not already present]
Generating copyin(array2(:,:)) [if not already present]
13,Loop is parallelizable
Generating Tesla code
13,!$acc loop gang,vector(32) ! blockidx%x threadidx%x
14,!$acc loop seq
16,!$acc loop seq
13,Local memory used for subarray2,subarray1
14,Loop is parallelizable
16,Loop is parallelizable
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。