Fortran 点积性能：随着数组大小的减小而增加 CPU 时间消耗

如何解决Fortran 点积性能：随着数组大小的减小而增加 CPU 时间消耗

这更多是关于自相关函数和计算机性能的理论问题。

https://en.wikipedia.org/wiki/Autocorrelation#Estimation

（抱歉不知道如何在堆栈溢出时输入方程）

自相关函数大多只是一个数组移动了一个时间（索引）的点积。

因此，自相关使用以下的点积：

原始数组从它的初始索引（0）到大小——correlation_time 索引

以及从索引（相关时间）到大小的原始数组

下面是这个点积总和的简单图片。 Another Picture

[1,2,3,4,5,6,7,8,9,10]
x    |  |  |  |  |  |  |  |  |
    [1,10]

到

[1,10]
x                   |  |  |  | 
                   [1,10]

由此，我相信当您长时间计算自相关时，点积应该会明显变小并且更容易计算。然而，下面的 fortran 代码却另有说明。

随着相关时间的增加，需要点积的数组更小，计算时间随着相关时间的增加而变大！

我到底错过了什么？

acorr.f90

PROGRAM acorr
    real:: a,b,c,d,sum
    integer:: i,j,jsize,beginning,rate,end
    real,dimension(2000000):: Jx,corr
    integer:: skip_lines = 4
    call system_clock(beginning,rate)
   
   !Reading file here
    open(10,file='data.log',status='old')
    do i = 1,skip_lines
        read(10,*)
    end do
    do i = 1,2000000
        read(10,*) Jx(i)
    end do
    call system_clock(end)
    print *,"elapsed time for reading: ",real(end - beginning) / real(rate)
    close(10)
    jsize = size(Jx)
    print *,"Size of Jx: ",jsize
    call system_clock(end)
    !End of reading file

    !begin dot product for autocorrelation
    print *,"Loop Start Time: ",real(end - beginning) / real(rate)
    do i =1,jsize
        a = dot_product(Jx(i:jsize),Jx(1:jsize-(i-1)))
        if(i == 1) then
            call system_clock(end)
            print *,"correlation time magnitude 1e0 elapsed time: ",real(end - beginning) / real(rate)

        else if(i == 10) then
            call system_clock(end)
            print *,"correlation time magnitude 1e1 elapsed time: ",real(end - beginning) / real(rate)


        else if(i == 100) then
            call system_clock(end)
            print *,"correlation time magnitude 1e2 elapsed time: ",real(end - beginning) / real(rate)

        else if(i == 1000) then
            call system_clock(end)
            print *,"correlation time magnitude 1e3 elapsed time: ",real(end - beginning) / real(rate)
        else if(i == 10000) then
            call system_clock(end)
            print *,"correlation time magnitude 1e4 elapsed time: ",real(end - beginning) / real(rate)
        else if(i == 100000) then
            call system_clock(end)
            print *,"correlation time magnitude 1e5 elapsed time: ",real(end - beginning) / real(rate)
        else if(i == 1000000) then
            call system_clock(end)
            print *,"correlation time magnitude 1e6 elapsed time: ",real(end - beginning) / real(rate)


        end if 

    end do
    call system_clock(end)
    print *,"elapsed time: ",real(end - beginning) / real(rate)
END PROGRAM

输出

elapsed time for reading:    4.67100000    
 Size of Jx:      2000000
 Loop Start Time:    4.67100000    
 correlation time magnitude 1e0 elapsed time:    4.67299986    
 correlation time magnitude 1e1 elapsed time:    4.69500017    
 correlation time magnitude 1e2 elapsed time:    4.90100002    
 correlation time magnitude 1e3 elapsed time:    6.93699980    
 correlation time magnitude 1e4 elapsed time:    27.3729992    
 correlation time magnitude 1e5 elapsed time:    227.809006 
 correlation time magnitude 1e6 elapsed time:    1704.59399    
 elapsed time:    2249.08105

注意：我有一个包含 2,000,0000 个时间点的数据文件。

编译： gfortran -o acorr.exe acorr.f90

系统：Ubuntu Linux 20.04

编辑我只是把点积线改成只做整个数组的点积

a = dot_product(Jx,Jx)

结果仍然相同，而且随着循环索引的增加，它花费的时间更长。

编辑 2

看起来我没有正确理解我的输出。我在循环中添加了以下内容：

    a = dot_product(Jx(i:jsize),Jx(1:jsize-(i-1)))
    a = 0
    call system_clock(end)
    write(20,*) real(end - end1) / real(rate)

看起来每个循环的输出只有 1 毫秒。这大约是完整向量的单个 dot_product 所需的时间。所以我相信它按预期工作。原始输出应该以对数尺度解释，因为显然从 10,000 到 20,000 比从 100 到 200 需要更多的迭代。

解决方法

看起来我没有正确理解我的输出。我在循环中添加了以下内容：

a = dot_product(Jx(i:jsize),Jx(1:jsize-(i-1)))
a = 0
call system_clock(end)
write(20,*) real(end - end1) / real(rate)

Fortran 点积性能：随着数组大小的减小而增加 CPU 时间消耗

如何解决Fortran 点积性能：随着数组大小的减小而增加 CPU 时间消耗

解决方法

相关推荐