微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

MPI 大小和 OpenMP 线程数

如何解决MPI 大小和 OpenMP 线程数

我正在尝试编写一个混合 OpenMP/MPI 程序,因此我试图了解 OpenMP 线程和 MPI 进程数量间的相关性。因此,我创建了一个小测试程序:

#include <iostream>
#include <mpi.h>
#include <thread>
#include <sstream>
#include <omp.h>

int main(int args,char *argv[]) {
    int rank,nprocs,thread_id,nthreads,cxx_procs;
    MPI_Init(&args,&argv);

    MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
    MPI_Comm_rank(MPI_COMM_WORLD,&rank);

    #pragma omp parallel private(thread_id,cxx_procs) 
    {
        thread_id = omp_get_thread_num();
        nthreads = omp_get_num_threads();
        cxx_procs = std::thread::hardware_concurrency();
        std::stringstream omp_stream;
        omp_stream << "I'm thread " << thread_id 
        << " out of " << nthreads 
        << " on MPI process nr. " << rank 
        << " out of " << nprocs 
        << ",while hardware_concurrency reports " << cxx_procs 
        << " processors\n";
        std::cout << omp_stream.str();
    }

    MPI_Finalize();
    return 0;
}

使用

编译
mpicxx -fopenmp -std=c++17 -o omp_mpi source/main.cpp -lgomp

带有 gcc-9.3.1OpenMPI 3。 现在,当在带有 ./omp_mpi 的 4c/8t 的 i7-6700 上执行它时,我得到以下输出

I'm thread 1 out of 8 on MPI process nr. 0 out of 1,while hardware_concurrency reports 8 processors
I'm thread 3 out of 8 on MPI process nr. 0 out of 1,while hardware_concurrency reports 8 processors
I'm thread 6 out of 8 on MPI process nr. 0 out of 1,while hardware_concurrency reports 8 processors
I'm thread 7 out of 8 on MPI process nr. 0 out of 1,while hardware_concurrency reports 8 processors
I'm thread 2 out of 8 on MPI process nr. 0 out of 1,while hardware_concurrency reports 8 processors
I'm thread 5 out of 8 on MPI process nr. 0 out of 1,while hardware_concurrency reports 8 processors
I'm thread 4 out of 8 on MPI process nr. 0 out of 1,while hardware_concurrency reports 8 processors
I'm thread 0 out of 8 on MPI process nr. 0 out of 1,while hardware_concurrency reports 8 processors

即正如预期的那样。
使用 mpirun -n 1 omp_mpi 执行它时,我期望相同,但我得到

I'm thread 0 out of 2 on MPI process nr. 0 out of 1,while hardware_concurrency reports 8 processors
I'm thread 1 out of 2 on MPI process nr. 0 out of 1,while hardware_concurrency reports 8 processors

其他线程在哪里?当在两个 MPI 进程上执行它时,我得到

I'm thread 0 out of 2 on MPI process nr. 1 out of 2,while hardware_concurrency reports 8 processors
I'm thread 1 out of 2 on MPI process nr. 1 out of 2,while hardware_concurrency reports 8 processors
I'm thread 0 out of 2 on MPI process nr. 0 out of 2,while hardware_concurrency reports 8 processors
I'm thread 1 out of 2 on MPI process nr. 0 out of 2,while hardware_concurrency reports 8 processors

即仍然只有两个 OpenMP 线程,但是在四个 MPI 进程上执行它时,我得到

I'm thread 1 out of 8 on MPI process nr. 1 out of 4,while hardware_concurrency reports 8 processors
I'm thread 3 out of 8 on MPI process nr. 3 out of 4,while hardware_concurrency reports 8 processors
I'm thread 1 out of 8 on MPI process nr. 3 out of 4,while hardware_concurrency reports 8 processors
I'm thread 7 out of 8 on MPI process nr. 1 out of 4,while hardware_concurrency reports 8 processors
I'm thread 0 out of 8 on MPI process nr. 2 out of 4,while hardware_concurrency reports 8 processors
I'm thread 4 out of 8 on MPI process nr. 2 out of 4,while hardware_concurrency reports 8 processors
I'm thread 6 out of 8 on MPI process nr. 3 out of 4,while hardware_concurrency reports 8 processors
I'm thread 2 out of 8 on MPI process nr. 3 out of 4,while hardware_concurrency reports 8 processors
I'm thread 6 out of 8 on MPI process nr. 2 out of 4,while hardware_concurrency reports 8 processors
I'm thread 0 out of 8 on MPI process nr. 3 out of 4,while hardware_concurrency reports 8 processors
I'm thread 2 out of 8 on MPI process nr. 0 out of 4,while hardware_concurrency reports 8 processors
I'm thread 3 out of 8 on MPI process nr. 2 out of 4,while hardware_concurrency reports 8 processors
I'm thread 3 out of 8 on MPI process nr. 0 out of 4,while hardware_concurrency reports 8 processors
I'm thread 6 out of 8 on MPI process nr. 0 out of 4,while hardware_concurrency reports 8 processors
I'm thread 0 out of 8 on MPI process nr. 1 out of 4,while hardware_concurrency reports 8 processors
I'm thread 4 out of 8 on MPI process nr. 1 out of 4,while hardware_concurrency reports 8 processors
I'm thread 6 out of 8 on MPI process nr. 1 out of 4,while hardware_concurrency reports 8 processors
I'm thread 7 out of 8 on MPI process nr. 0 out of 4,while hardware_concurrency reports 8 processors
I'm thread 1 out of 8 on MPI process nr. 2 out of 4,while hardware_concurrency reports 8 processors
I'm thread 1 out of 8 on MPI process nr. 0 out of 4,while hardware_concurrency reports 8 processors
I'm thread 7 out of 8 on MPI process nr. 3 out of 4,while hardware_concurrency reports 8 processors
I'm thread 4 out of 8 on MPI process nr. 0 out of 4,while hardware_concurrency reports 8 processors
I'm thread 0 out of 8 on MPI process nr. 0 out of 4,while hardware_concurrency reports 8 processors
I'm thread 4 out of 8 on MPI process nr. 3 out of 4,while hardware_concurrency reports 8 processors
I'm thread 5 out of 8 on MPI process nr. 2 out of 4,while hardware_concurrency reports 8 processors
I'm thread 5 out of 8 on MPI process nr. 0 out of 4,while hardware_concurrency reports 8 processors
I'm thread 3 out of 8 on MPI process nr. 1 out of 4,while hardware_concurrency reports 8 processors
I'm thread 5 out of 8 on MPI process nr. 1 out of 4,while hardware_concurrency reports 8 processors
I'm thread 7 out of 8 on MPI process nr. 2 out of 4,while hardware_concurrency reports 8 processors
I'm thread 2 out of 8 on MPI process nr. 2 out of 4,while hardware_concurrency reports 8 processors
I'm thread 2 out of 8 on MPI process nr. 1 out of 4,while hardware_concurrency reports 8 processors
I'm thread 5 out of 8 on MPI process nr. 3 out of 4,while hardware_concurrency reports 8 processors

现在突然每个 MPI 进程有 8 个 OpenMP 线程。这种变化从何而来?

解决方法

mpirun 的手册页解释了:

如果您只是在寻找如何运行 MPI 应用程序, 您可能想要使用以下形式的命令行:

  % mpirun [ -np X ] [ --hostfile <filename> ]  <program>

这将运行您当前的 X 个副本 运行时环境 (...)

请注意 mpirun 会自动绑定进程 v1.8 系列的开始。三种绑定模式用于 没有任何进一步的指令:

  Bind to core:     when the number of processes is <= 2
  Bind to socket:   when the number of processes is > 2
  Bind to none:     when oversubscribed

如果您的应用程序使用线程,那么您可能希望确保 你要么完全不受约束 (通过指定 --bind-to none),或绑定到多个内核 使用适当的绑定级别或特定数字 每个应用进程的处理元素数。

现在,如果您指定 1 或 2 个 MPI 进程,mpirun 默认为 --bind-to core,这会导致每个 MPI 进程有 2 个线程。 但是,如果您指定 4 个 MPI 进程,则 mpirun 默认为 --bind-to socket 并且每个进程有 8 个线程,因为您的机器是单套接字的。我在笔记本电脑 (1s/2c/4t) 和工作站(2 个插槽,每个插槽 12 个内核,每个内核 2 个线程)上对其进行了测试,并且程序(没有 np 参数)的行为如上所示:对于工作站有 24 个 MPI 进程,每个进程有 24 个 OpenMP 线程。

,

您正在观察 Open MPI 的特性与 GNU OpenMP Runtime libgomp 之间的相互作用。

首先,OpenMP 中的线程数由num-threads ICV(内部控制变量)控制,设置方法是调用omp_set_num_threads()或者设置{ {1}} 在环境中。当 OMP_NUM_THREADS 未设置且未调用 OMP_NUM_THREADS 时,运行时可以自由选择它认为合理的任何默认值。对于 omp_set_num_threads()the manual 表示:

libgomp

指定在并行区域中使用的默认线程数。此变量的值应为逗号分隔的正整数列表;该值指定用于相应嵌套级别的线程数。默认情况下,在列表中指定多个项目将自动启用嵌套。 如果未定义,每个 CPU 使用一个线程。

它没有提到的是它使用各种启发式方法来确定正确的 CPU 数量。在 Linux 和 Windows 上,进程关联掩码用于此目的(如果您喜欢阅读代码,Linux 的掩码是 right here)。如果进程绑定到单个逻辑 CPU,则只能获得一个线程:

OMP_NUM_THREADS

如果将其绑定到多个逻辑 CPU,则使用它们的计数:

$ taskset -c 0 ./omp_mpi
I'm thread 0 out of 1 on MPI process nr. 0 out of 1,while hardware_concurrency reports 12 processors

此特定于 $ taskset -c 0,2,5 ./ompi_mpi I'm thread 0 out of 3 on MPI process nr. 0 out of 1,while hardware_concurrency reports 12 processors I'm thread 2 out of 3 on MPI process nr. 0 out of 1,while hardware_concurrency reports 12 processors I'm thread 1 out of 3 on MPI process nr. 0 out of 1,while hardware_concurrency reports 12 processors 的行为与特定于 Open MPI 的另一个行为交互。早在 2013 年,Open MPI 就更改了其默认绑定策略。原因在某种程度上是技术原因和政治因素的结合,您可以在 Jeff Squyres' blog 上阅读更多信息(Jeff 是核心 Open MPI 开发人员)。

这个故事的寓意是:

始终明确设置 OpenMP 线程数和 MPI 绑定策略。 对于 Open MPI,设置环境变量的方法是使用 libgomp

-x

请注意,我启用了超线程,因此 $ mpiexec -n 2 --map-by node:PE=3 --bind-to core -x OMP_NUM_THREADS=3 ./ompi_mpi I'm thread 0 out of 3 on MPI process nr. 0 out of 2,while hardware_concurrency reports 12 processors I'm thread 2 out of 3 on MPI process nr. 0 out of 2,while hardware_concurrency reports 12 processors I'm thread 1 out of 3 on MPI process nr. 0 out of 2,while hardware_concurrency reports 12 processors I'm thread 0 out of 3 on MPI process nr. 1 out of 2,while hardware_concurrency reports 12 processors I'm thread 1 out of 3 on MPI process nr. 1 out of 2,while hardware_concurrency reports 12 processors I'm thread 2 out of 3 on MPI process nr. 1 out of 2,while hardware_concurrency reports 12 processors --bind-to core 在没有明确设置 --bind-to hwthread 的情况下会产生不同的结果:

OMP_NUM_THREADS

对比

mpiexec -n 2 --map-by node:PE=3 --bind-to core ./ompi_mpi 
I'm thread 0 out of 6 on MPI process nr. 1 out of 2,while hardware_concurrency reports 12 processors
I'm thread 2 out of 6 on MPI process nr. 1 out of 2,while hardware_concurrency reports 12 processors
I'm thread 3 out of 6 on MPI process nr. 1 out of 2,while hardware_concurrency reports 12 processors
I'm thread 5 out of 6 on MPI process nr. 1 out of 2,while hardware_concurrency reports 12 processors
I'm thread 0 out of 6 on MPI process nr. 0 out of 2,while hardware_concurrency reports 12 processors
I'm thread 5 out of 6 on MPI process nr. 0 out of 2,while hardware_concurrency reports 12 processors
I'm thread 1 out of 6 on MPI process nr. 0 out of 2,while hardware_concurrency reports 12 processors
I'm thread 4 out of 6 on MPI process nr. 1 out of 2,while hardware_concurrency reports 12 processors
I'm thread 1 out of 6 on MPI process nr. 1 out of 2,while hardware_concurrency reports 12 processors
I'm thread 3 out of 6 on MPI process nr. 0 out of 2,while hardware_concurrency reports 12 processors
I'm thread 4 out of 6 on MPI process nr. 0 out of 2,while hardware_concurrency reports 12 processors
I'm thread 2 out of 6 on MPI process nr. 0 out of 2,while hardware_concurrency reports 12 processors

mpiexec -n 2 --map-by node:PE=3 --bind-to hwthread ./ompi_mpi I'm thread 0 out of 3 on MPI process nr. 0 out of 2,while hardware_concurrency reports 12 processors 为每个 MPI 排名每个节点的三个处理元素 (PE)。当绑定到核心时,PE 就是核心。当绑定到硬件线程时,PE 是一个线程,应该使用 --map-by node:PE=3,即在我的情况下使用 --map-by node:PE=#cores*#threads

OpenMP 运行时是否尊重 MPI 设置的亲和性掩码,是否将自己的线程亲和性映射到它上面,如果不这样做怎么办,则完全不同。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。