微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

为什么即使数据超过 CPU 缓存,性能也会线性下降?

如何解决为什么即使数据超过 CPU 缓存,性能也会线性下降?

我试图找出为什么我的基准测试中的性能会线性下降的答案。据我所知,我的 L1 数据缓存大小为 32 KiB。我希望在我的基准测试中的某个时候看到巨大的差异,但没有。有没有人知道出了什么问题?

我的假设是否正确,在适合 L1 的小数组(例如 512 字节)上的迭代应该比 L3/RAM 快几个数量级?

系统信息

我的机器基准测试得分

Benchmark                      (longs)  Mode  Cnt      score    Error  Units
AdditionBenchmark.addition       64  avgt    5      15.899 ±    0.434  us/op
AdditionBenchmark.addition      128  avgt    5      31.062 ±    0.855  us/op
AdditionBenchmark.addition      256  avgt    5      61.062 ±    0.524  us/op
AdditionBenchmark.addition      512  avgt    5     121.283 ±    3.414  us/op
AdditionBenchmark.addition     1024  avgt    5     243.102 ±    2.684  us/op
AdditionBenchmark.addition     2048  avgt    5     483.627 ±    3.025  us/op
AdditionBenchmark.addition     4096  avgt    5     969.184 ±   20.331  us/op
AdditionBenchmark.addition     8192  avgt    5    1948.989 ±   43.016  us/op
AdditionBenchmark.addition    16384  avgt    5    3848.305 ±   95.566  us/op
AdditionBenchmark.addition    32768  avgt    5    7755.592 ±  205.539  us/op
AdditionBenchmark.addition    65536  avgt    5   16663.239 ±  388.075  us/op
AdditionBenchmark.addition   131072  avgt    5   33894.256 ± 1218.559  us/op
AdditionBenchmark.addition   262144  avgt    5   67138.158 ±  512.422  us/op
AdditionBenchmark.addition   524288  avgt    5  139241.696 ± 9124.094  us/op

基准代码

@Warmup(iterations = 5,time = 2)
@Measurement(iterations = 5,time = 2)
@Fork(value = 1,jvmArgsAppend = {
        "-Xmx4G","-xms4G"
})
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@State(Scope.Benchmark)
public class AdditionBenchmark {

    long[] array;

    @Param({"64","128","256","512","1024","2048","4096","8192","16384","32768","65536","131072","262144","524288"})
    int longs;

    @Setup(Level.Trial)
    public void setup() {
        ThreadLocalRandom r = ThreadLocalRandom.current();
        array = r.longs().limit(longs).toArray();
    }

    @Benchmark
    public int addition() {
        int c = 0;

        for (int i = 0; i < 1000; i++) {
            for (long w : array) {
                c += w;
            }
        }

        return c;
    }

    public static void main(String[] args) throws IOException {
        org.openjdk.jmh.Main.main(args);
    }
}

UPDATE 1,更多随机游走,仍然几乎是线性的

Benchmark                   (longs)  Mode  Cnt       score      Error  Units
AdditionBenchmark.addition       64  avgt    5     108.005 ±    9.939  us/op
AdditionBenchmark.addition      128  avgt    5     224.575 ±    5.631  us/op
AdditionBenchmark.addition      256  avgt    5     446.561 ±    3.309  us/op
AdditionBenchmark.addition      512  avgt    5     891.957 ±    9.686  us/op
AdditionBenchmark.addition     1024  avgt    5    1790.980 ±   65.162  us/op
AdditionBenchmark.addition     2048  avgt    5    3572.439 ±  108.912  us/op
AdditionBenchmark.addition     4096  avgt    5    7136.228 ±   42.287  us/op
AdditionBenchmark.addition     8192  avgt    5   14236.296 ±  224.648  us/op
AdditionBenchmark.addition    16384  avgt    5   28723.188 ±  832.471  us/op
AdditionBenchmark.addition    32768  avgt    5   57640.325 ± 1562.439  us/op
AdditionBenchmark.addition    65536  avgt    5  181636.470 ± 6433.434  us/op

@Benchmark
public int addition() {
    int c = 0;
    int mask = array.length - 1;

    for (int i = 0; i < 1000; i++) {
        for (int j = 0; j < array.length; j++) {
            c += array[c & mask];
        }
    }

    return c;
}

更新 2,现在有意义

Benchmark                   (longs)  Mode  Cnt       score      Error  

L1
AdditionBenchmark.addition       64  avgt    5     127.561 ±     6.875  us/op
AdditionBenchmark.addition      128  avgt    5     251.343 ±     5.783  us/op
AdditionBenchmark.addition      256  avgt    5     502.992 ±     6.485  us/op
AdditionBenchmark.addition      512  avgt    5    1002.569 ±     4.776  us/op
AdditionBenchmark.addition     1024  avgt    5    2008.738 ±    51.341  us/op
AdditionBenchmark.addition     2048  avgt    5    4020.453 ±    37.940  us/op
AdditionBenchmark.addition     4096  avgt    5    8058.776 ±    99.703  us/op

L2
AdditionBenchmark.addition     8192  avgt    5   22918.915 ±   263.238  us/op
AdditionBenchmark.addition    16384  avgt    5   53222.615 ±  1162.671  us/op
AdditionBenchmark.addition    32768  avgt    5  117576.770 ±  2098.845  us/op

L3
AdditionBenchmark.addition    65536  avgt    5  528979.627 ± 16870.041  us/op

Main memory
for 2097152 longs comparing to 1048576,another 5x slower penalty but It takes too much time to complete
@Setup // new approach for setup
public void setup() {
    array = LongStream.range(0,longs).limit(longs).toArray();
    ArrayUtils.shuffle(array);
}

@Benchmark
public int addition() {
    int c = 0;
    int mask = array.length - 1;

    for (int i = 0; i < 1000; i++) {
        for (int j = 0; j < array.length; j++) {
            c += array[(c ^ j) & mask];
        }
    }

    return c;
}

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。