写入 Linux 管道比文件更快，但在内核级别，为什么？

如何解决写入 Linux 管道比文件更快，但在内核级别，为什么？

我正在研究写入文件与管道的速度。请看这段代码，除非有命令行参数，否则写入文件句柄，否则写入管道：

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <iostream>
#include <chrono>
#include <string.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

using namespace std;


void do_write(int fd)
{
    const char* data = "Hello World!";
    int to_write = strlen(data),total_written = 0;
    
    int x = 0;
    auto start = chrono::high_resolution_clock::Now();

    
    while (x < 50000)
    {       
        int written = 0;
        while (written != to_write)
        {
            written += write(fd,data + written,to_write - written);
        }
        total_written += written;
        ++x;
    }
    auto end = chrono::high_resolution_clock::Now();

    auto diff = end - start;
    
    cout << "Total bytes written: " <<  total_written << " in " << chrono::duration<double,milli>(diff).count() 
        << " milliseconds," << endl;
}
    
int main(int argc,char *argv[])
{  
    //
    // Write to file if we have not specified any extra argument
    //
    
    if (argc == 1)
    {   
        {   
            int fd = open("test.txt",O_WRONLY | O_Trunc | O_CREAT,0655);
            if (fd == -1) return -1;
            do_write(fd);
        }   
        
        return 0;
    }
    
    //
    // Otherwise,write to pipe
    //
    int the_pipe[2];
    if (pipe(the_pipe) == -1) return -1;
    
    pid_t child = fork();
    switch (child)
    {
    case -1:
        {
            return -1;
        }
    case 0:
        {
            char buf[128];
            int bytes_read = 0,total_read = 0;
            close(the_pipe[1]);
            while (true)
            {
                if ((bytes_read = read(the_pipe[0],buf,128)) == 0)
                    break;
                total_read += bytes_read;
            }
            cout << "Child: Total bytes read: " << total_read << endl;
            break;
        }
    default:
        {
            close(the_pipe[0]);
            do_write(the_pipe[1]);
            break;
        }
    }
    return 0;
}

这是我的输出：

$ time ./LinuxFlushTest pipe

Total bytes written: 600000 in 59.6544 milliseconds,real    0m0.064s
user    0m0.020s
sys     0m0.040s
Child: Total bytes read: 600000

$ time ./LinuxFlushTest
Total bytes written: 600000 in 154.367 milliseconds,real    0m0.159s
user    0m0.028s
sys     0m0.132s

从 time 输出和我的 C++ 代码计时，您可以看到写入管道的速度比写入文件快得多。

现在，据我所知，当我们调用 write() 时，数据将被复制到内核缓冲区，此时 pdflush 样式的线程实际上会将其从页面缓存刷新到底层文件。 我没有在我的代码中强制执行此刷新，因此没有磁盘搜索延迟。

但是我不知道（并且似乎无法找到：是的，我查看了内核代码但迷失在其中，因此没有诸如“查看代码”之类的注释请）写入管道时会发生什么不同：它不只是内核中孩子可以读取的内存块？在那种情况下，为什么它比写入文件的基本相同过程快这么多？

解决方法

现在，据我所知，当我们调用 write() 时，数据将被复制到内核缓冲区，此时 pdflush 样式的线程将实际上将它从页面缓存刷新到底层文件。 我是不会在我的代码中强制执行此刷新，因此没有磁盘查找延迟。

您似乎有一些误解，包括：

您无需为内核显式执行任何操作即可将写入的数据刷新到底层输出设备。它可能会自行决定将部分甚至全部数据缓存在内存中一段时间，但可以预期内核确实会在某个时刻写入数据，即使没有来自用户空间的明确指令。这可能会受到写入数据量的影响，在您的情况下，数据量似乎是 600000 字节。
磁盘查找并不是磁盘 I/O（相对）较慢的唯一原因。即使使用 SSD 进行 I/O 也比仅内存数据传输慢。
除其他外，标准文件系统不仅仅是一个扁平的字节跨度。即使没有任何移动部件，仍然必须与文件系统的数据结构交互以找出写入的位置，并在写入后更新它。通常希望该信息立即对其他进程可见，因此通常不会无限期地推迟。

但我不知道的是 [...] 写信时会发生什么不同管道：它不只是内核中某处的一块内存孩子可以阅读吗？

除此之外还有一点，但这是一个合理的第一个近似值。

既然如此，为什么比写入文件的过程基本相同？

因为写入普通文件不基本上是相同的。还有很多。