用于分叉图的优秀 Julia 代码

如何解决用于分叉图的优秀 Julia 代码

我正在将我所有的 Matlab 代码迁移到 Julia。我有一个旧的 Matlab 脚本，它生成逻辑图的标准分叉图，速度非常快：循环 ≈0.09 秒，绘制出来的图 ≈0.11 秒（Matlab 2019a）。物流图是众所周知的，我将在此处简要介绍：?"stringdist-metrics"、x(t+1) = r*x(t)*(1-x(t))。为准确起见，我选择 0 ≤ r ≤ 4 和 maxiter = 1000。

我曾尝试将我的 Matlab 代码改写为 Julia，但我仍然是一个笨拙的 Julia 程序员。我能想到的最好的方法是让 r = LinRange(0.0,4.0,6001) 运行循环，而 Plots.jl 需要 1.341015 seconds (44.79 M allocations: 820.892 MiB,4.60% gc time) 来保存绘图的 pdf 文件（不错），而它需要大约 17s获取要在 Atom 绘图窗格中看到的绘图（这对绘图没问题）。这是在 Julia 1.5.3（在 Atom 和 VS Code 中）完成的。

如果有人可以为我下面的 Julia 代码提供一些帮助，我将不胜感激。它运行，但它看起来有点原始和缓慢。我试图改变风格并寻找性能提示（@inbounds、@simd、@avx），但总是陷入一个或另一个问题。在 Matlab 中使用比在 Julia 中快 15 倍的相同循环根本没有意义，我知道这一点。实际上，有一段我特别喜欢的 Matlab 代码（由 Steve Brunton 编写），非常简单和优雅，而且（显然）很容易用 Julia 重写；但我也在这里绊倒了。 Brunton 的循环仅运行 ≈0.04 秒，可以在下面找到。

帮助将不胜感激。谢谢。情节是通常的： enter image description here

我的 Matlab 代码：

2.266 s (6503764 allocations: 283.60 MiB)

我的 Julia 代码：

tic 
hold on % required for plotting all points
maxiter = 1000;
r1 = 0;
r4 = 4;
Tot = 6001;
r = linspace(r1,r4,Tot);   % Number of r values (6001 points)
np = length(r);
y = zeros(maxiter+1,Tot);   % Pre-allocation
y(1,1:np) = 0.5;            % Generic initial condition

for n = 1 : maxiter
   y(n+1,:) = r.*y(n,:) .* (1-y(n,:)); % Iterates all the r values at once
end
toc

tic
for n = maxiter-100 : maxiter+1  % eliminates transients
    plot(r,y(n,:),'b.','Markersize',0.01)
    grid on
    yticks([0 0.2 0.4 0.6 0.8 1])
end
hold off
toc

Steve Brunton 的 Matlab 代码：

using Plots
using BenchmarkTools
#using LoopVectorization
#using SIMD

@time begin
    rs = LinRange(0.0,6001) 
    #rs = collect(rs)
    x1 = 0.5
    maxiter = 1000 # maximum iterations
    x = zeros(length(rs),maxiter) # for each starting condition (across rows)
    #for k = 1:length(rs)
    for k in eachindex(rs)
        x[k,1] = x1 # initial condition
        for j = 1 : maxiter-1
            x[k,j+1] = rs[k] * x[k,j] * (1 - x[k,j])
        end
    end
end

@btime begin
    plot(rs,x[:,end-50:end],#avoiding transients
    seriestype = :scatter,markercolor=:blue,markerstrokecolor=:match,markersize = 1,markerstrokewidth = 0,legend = false,markeralpha = 0.3)
    #xticks! = 0:1:4
    xlims!(0.01,4)
end

解决方法

@time begin
    rs = LinRange(0.0,4.0,6001) 
    x1 = 0.5
    maxiter = 1000 # maximum iterations
    x = zeros(length(rs),maxiter)
    for k in eachindex(rs)
        x[k,1] = x1 # initial condition
        for j = 1 : maxiter-1
            x[k,j+1] = rs[k] * x[k,j] * (1 - x[k,j])
        end
    end
end

显示：

1.490238 秒（44.79 M 分配：820.892 MiB，5.81% gc 时间）

而

@time begin
    let
        rs = LinRange(0.0,6001)
        x1 = 0.5
        maxiter = 1000 # maximum iterations
        x = zeros(length(rs),maxiter)
        for k in eachindex(rs)
            x[k,1] = x1 # initial condition
            for j = 1 : maxiter-1
                x[k,j])
            end
        end
    end
end

节目

0.044452 秒（2 次分配：45.784 MiB，29.09% 的 gc 时间）

let 引入了一个新的范围，所以您的问题是您在全局范围内运行。

编译器发现很难在全局范围内优化代码，因为可以从源代码（可能是 1000 行）中的任何位置访问任何变量。如 the manual 中所述，这是代码运行速度比预期慢的第一大原因。

我不知道你为什么被标记，也许这种交流方式不太适合StackOverflow，我可以推荐使用https://discourse.julialang.org，它更适合长时间讨论。

关于您的代码，有几件事可以改进。

using Plots
using BenchmarkTools
using LoopVectorization

function bifur(rs,xs,maxiter)
    x = zeros(length(rs),maxiter) # for each starting condition (across rows)
    bifur!(x,rs,maxiter)
end

function bifur!(x,maxiter)
    # @avx - LoopVectorization is broken on julia nightly,so I had to switch to other options
    @inbounds @simd for k = 1 : length(rs) # macro to vectorize the loop
        x[k,1] = xs # initial condition
        for j = 1 : maxiter-1
            x[k,j+1] =  rs[k] * x[k,j])
        end
    end

    return x
end

如您所见，我将 bifur 拆分为两个函数：mutating（用感叹号表示）和 non mutating（仅命名为 bifur）。这种模式在 Julia 中很常见，有助于正确地进行基准测试和使用代码。如果您想要不分配的更快版本，则使用变异版本。如果您想要具有保证结果的较慢版本（即它不会在不同的运行之间改变），您可以使用非变异版本。

您可以在此处查看基准测试结果

julia> @benchmark bifur($rs,$xs,$maxiter)
BenchmarkTools.Trial:
  memory estimate:  45.78 MiB
  allocs estimate:  2
  --------------
  minimum time:     38.556 ms (0.00% GC)
  median time:      41.440 ms (0.00% GC)
  mean time:        45.371 ms (10.08% GC)
  maximum time:     170.765 ms (77.25% GC)
  --------------
  samples:          111
  evals/sample:     1

这看起来很合理 - 最佳运行时间是 38 毫秒，2 次分配显然来自 x。还要注意变量 xs 和其他变量是用 $ 符号插入的。它有助于正确进行基准测试，更多内容可以在 BenchmarkTools.jl manual 中阅读。

我们可以将其与变异版本进行比较

julia> @benchmark bifur!(x,$rs,$maxiter) setup=(x = zeros(length($rs),$maxiter))
evals = 1
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     25.574 ms (0.00% GC)
  median time:      27.556 ms (0.00% GC)
  mean time:        27.717 ms (0.00% GC)
  maximum time:     35.223 ms (0.00% GC)
  --------------
  samples:          98
  evals/sample:     1

我不得不添加 setup phase，现在您可以看到没有分配，正如预期的那样，它运行速度稍快，13 毫秒的差异是 x 初始化。

请注意：您的基准测试结果是错误的，因为您没有在运行之间重置 x（没有设置阶段），所以 x 在第一次运行时已正确初始化，但在所有其他运行没有执行额外的计算。所以你会得到 100 次运行，第一次运行等于 30 毫秒，所有其他运行等于 4 毫秒，所以平均你得到 4 毫秒。

现在，您可以直接使用此功能

x = bifur(rs,maxiter)

@btime begin
    plot($rs,@view($x[:,end-50:end]),#avoid transients
    seriestype = :scatter,markercolor=:blue,markerstrokecolor=:match,markersize = 1.1,markerstrokewidth = 0,legend = false,markeralpha = 0.3)
    #xticks! = 0:1:4
    xlims!(2.75,4)
    #savefig("zee_bifurcation.pdf")
end

请注意，这里我再次使用插值，并使用 @view 进行正确的数组切片。问题是，像 x[:,end-50:end] 这样的表达式会创建数组的新副本，这有时会减慢计算速度。当然，它在 Plots.jl 的情况下并不重要，但在其他计算中可能有用。

我对上面展示的逻辑地图的分叉图的 Julia 代码风格和性能不佳感到沮丧。我是 Julia 的新手，希望对有经验的 Julia 程序员来说并不难的问题有所帮助。好吧，在寻求帮助时，我的头被旗帜击倒，告诉我我偏离了我的问题或其他什么。我没有。

我现在感觉好多了。下面的代码运行两个循环（在函数内），在 6001×1000 Array{Float64,2} 中填充一个 3.276 ms (0 allocations: 0 bytes)，然后在 Atom 的绘图窗格中的 17.660 ms (83656 allocations: 11.32 MiB) 中显示绘图。我认为它仍然可以改进，但是（对我而言）它现在还可以。朱莉娅摇滚。

using Plots
using BenchmarkTools
using LoopVectorization

function bifur(rs,x,maxiter)
    @avx for k = 1 : length(rs) # macro to vectorize the loop
        x[k,j])
        end
    end
    #return x
end

rs = LinRange(0.0,6001) #rs = collect(rs)
xs = 0.5
maxiter = 1000 # maximum iterations
x = zeros(length(rs),maxiter) # for each starting condition (across rows)

@btime begin 
    bifur(rs,maxiter) 
end

@benchmark bifur(rs,maxiter)

@btime begin
    plot(rs,x[:,end-50:end],4)
    #savefig("zee_bifurcation.pdf")
end

情节应该是这样的：

有没有人知道如何将 Brunton 上面的 Matlab 代码改编为 Julia？我有点喜欢他的风格。用于教学看起来很漂亮。

与此同时，我了解到 @avx 中的 LoopVectorization.jl 宏不应该应用于两个不独立的循环（这两个循环就是这种情况），如果我更改循环顺序，性能将提高 5 倍。因此，为逻辑图生成分叉图的更好的代码将是这样的：

using Plots
using BenchmarkTools
#using LoopVectorization
using SIMD

function bifur!(rs,maxiter)
    x[:,1] .= xs
    for j in 1:maxiter-1 #the j loop needs to be sequential,so should be the outer loop
        @inbounds @simd for k in 1:length(rs) # the k loop should be the innermost since it is first on x.
            x[k,j])
        end
    end
    return x
end

rs = LinRange(0.0,maxiter)

bifur!(rs,maxiter)

@benchmark bifur!($rs,$maxiter)) evals=1

@btime begin
    plot(rs,#avoid transients
    #plot($rs,seriestype = :scatter,4)
    #savefig("logistic_bifur.pdf")
end