如何解决acc并行区域内的例程
阅读此how-can-a-fortran-openacc-routine-call-another-fortran-openacc-routine之后,我仍然对OpenACC函数调用限制感到困惑。
PROGRAM Test
IMPLICIT NONE
CONTAINS
SUbroUTINE OuterRoutine( N )
!$acc routine
IMPLICIT NONE
INTEGER :: N
real :: y
INTEGER :: i
DO i = 0,N
call InnerRoutine( y )
ENDDO
END SUbroUTINE OuterRoutine
subroutine InnerRoutine( y )
!$acc routine
IMPLICIT NONE
real :: y
END subroutine InnerRoutine
END PROGRAM Test
当我使用nvfortran
20.7版进行编译时,我得到了
$ nvfortran -acc -Minfo routine.f90
outerroutine:
14,Generating acc routine seq
Generating Tesla code
22,Reference argument passing prevents parallelization: y
innerroutine:
27,Generating acc routine seq
Generating Tesla code
nvvmCompileProgram error 9: NVVM_ERROR_COMPILATION.
Error: /tmp/pgaccr22eZdxceweL.gpu (43,14): parse invalid forward reference to function '_innerroutine_' with wrong type!
ptxas /tmp/pgaccH22eJTMb0hKD.ptx,line 1; fatal : Missing .version directive at start of file '/tmp/pgaccH22eJTMb0hKD.ptx'
ptxas fatal : Ptx assembly aborted due to errors
NVFORTRAN-S-0155-Compiler Failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (routine_inline.f90: 1)
0 inform,0 warnings,1 severes,0 fatal for
module data
integer,parameter :: maxl = 100000
real,dimension(maxl) :: xstat
real,dimension(:),allocatable :: yalloc
!$acc declare create(xstat,yalloc)
logical :: IsUsed
!$acc declare create(IsUsed)
end module
module useit
use data
contains
subroutine compute(n)
integer :: n
integer :: i
!$acc parallel loop present(yalloc,xstat)
do i = 1,n
call iprocess(i,yalloc)
enddo
end subroutine
subroutine iprocess(i,yalloc)
!$acc routine seq
integer :: i
real,intent(out) :: yalloc(:)
if(IsUsed) call kernel(i,yalloc)
contains
subroutine kernel(i,yalloc)
!$acc routine seq
integer,intent(in) :: i
real,intent(out) :: yalloc(:)
yalloc(i) = 2*xstat(i)
end subroutine
end subroutine
end module
program main
use data
use useit
implicit none
integer :: nSize = 100
!---------------------------------------------------------------------------
call alLocit(nSize)
call initialize
call compute(nSize)
!$acc update self(yalloc)
write(*,*) "yalloc(10)=",yalloc(10) ! 3
call finalize
contains
subroutine alLocit(n)
integer :: n
allocate(yalloc(n))
end subroutine alLocit
subroutine initialize
xstat = 1.0
yalloc = 1.0
IsUsed = .true.
!$acc update device(xstat,yalloc,IsUsed)
end subroutine initialize
subroutine finalize
deallocate(yalloc)
end subroutine finalize
end program main
可以用OpenACC编译并运行。
更新:令人惊讶的是,对于第一段代码,当我简单地切换子例程的顺序时,它就起作用了:
PROGRAM Test
IMPLICIT NONE
CONTAINS
subroutine InnerRoutine( y )
!$acc routine
IMPLICIT NONE
real :: y
END subroutine InnerRoutine
SUbroUTINE OuterRoutine( N )
!$acc routine
IMPLICIT NONE
INTEGER :: N
real :: y
INTEGER :: i
DO i = 0,N
call InnerRoutine( y )
ENDDO
END SUbroUTINE OuterRoutine
END PROGRAM Test
让我感到非常惊奇的是,这一特殊功能取决于例行命令。但是,为什么它对上面的第二个示例有用?
解决方法
这是编译器设备代码生成错误。从“ OuterRoutine”调用“ InnerRoutine”时,编译器将隐藏参数正确添加到父级堆栈中,但“ InnerRoutine”的定义将其作为实际参数丢失。错误是被叫方和呼叫方之间不匹配。
我添加了一个问题报告,TPR#29057。不清楚是更大的问题还是小型测试用例的产物。
注意,请注意使用包含的设备子例程。 Fortran允许通过传递指向父代堆栈的指针来访问父代的局部变量。如果父级位于主机上,子级位于设备上,则直接访问父级变量将导致运行时错误。例如,如果“计算”中包含“ iprocess”,而您直接访问了“ i”,而不是将其作为参数传递,则由于设备无法访问主机的堆栈,您会得到错误消息。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。