将像素写入自定义缓冲区时性能不一致 (X11 / C)

如何解决将像素写入自定义缓冲区时性能不一致 (X11 / C)

我正在基于 X11 编写自己的库，它仅使用 cpu 渲染（不允许使用 GPU！）。
今天我偶然发现了一个奇怪的发现。当我像这样绘制像素时：

const int ratio = 1;
for(int y = 0; y < XF_GetwindowHeight() / ratio; ++y)
    for(int x = 0; x < XF_GetwindowWidth() / ratio; ++x)
        XF_DrawPoint(x * ratio,y * ratio,0xff0000); // function gets x,y and color

每帧约 1.5 毫秒后完成。但是，当我绘制相同的区域但使用较少的循环调用并绘制更大的矩形而不是点（像素）时，我得到的结果约为 0.8 毫秒。

const int ratio = 32;
for(int y = 0; y < XF_GetwindowHeight() / ratio; ++y)
    for(int x = 0; x < XF_GetwindowWidth() / ratio; ++x)
        // function gets x,y,w,h,color and if only draw outline (doesn't matter in this case)
        XF_DrawRect(x * ratio,ratio,0xff0000,false);

我觉得很奇怪，即使 XF_DrawRect 函数更复杂并且最终两个循环都绘制了相同数量的像素，但调用更少的循环似乎对性能影响很大。

void XF_DrawPoint(int x,int y,uint32_t color)

*(h_lines[y] + x) = color; // h_lines is array with pointers to each row

void XF_DrawRect(int x,int w,int h,uint32_t color,XF_Bool 轮廓)

uint32_t *s = h_lines[y] + x;
int hz_count = 0;

while(h--) {
    hz_count = w;

    while(hz_count--) {
        *s++ = color;
    }

    s += WINDOW_WIDTH - w;
}

因此，正如您所看到的，XF_DrawRect 的实现更复杂（如果矩形超出边界，则在开始时会进行一些范围检查以修剪矩形，但这无关紧要）然后是 XF_DrawPoint，并且仍然存在在绘制相同区域时，速度提高了约 2 倍。

我的问题是：为什么？