如何解决在OpenMP并行中为循环调用Armadillo函数会导致数据损坏
我正在尝试使用arma::solve
使用Armadillo和OpenMP并行处理大型线性系统求解。我不想直接调用求解器,而是将问题分解为较小的RHS向量块,并在OpenMP循环中并行调用它们,如下清单所示。
这将导致相同的答案,因为多个RHS是独立的问题,但是当我以这种方式运行代码时,经常会遇到一些列混乱的情况。我什至尝试将回写括在omp critical
部分中,但仍然失败。
以这种方式运行Armadillo还是安全的?
// to compile the code run as
// g++ parals.cpp -I$ARMADILLO_INCLUDE_DIR -L$ARMADILLO_INCLUDE_DIR/../lib64
// -larmadillo -fopenmp -lopenblas -o parals.out
#define ARMA_DONT_USE_WRAPPER
#define ARMA_USE_BLAS
#define ARMA_USE_LAPACK
#include <armadillo>
#include <omp.h>
#include <iostream>
using namespace arma;
/*
* Solves LS for a single problem,*
* ||AX - B ||_F^2
*/
int main(int argc,char *argv[]) {
int m = atoi(argv[1]); // A is of size m \times n
int n = atoi(argv[2]); // A is of size m \times n
int k = atoi(argv[3]); // B is of size m \times k
int seed = atoi(argv[4]); // seed for random inits
int chunk = atoi(argv[5]); // chunk size to group RHS
arma::arma_rng::set_seed(seed);
std::cout << "m::" << m << "::n::" << n << "::k::" << k
<< "::seed::" << seed << "::chunk::" << chunk
<< std::endl;
mat A(m,n,arma::fill::randu);
//std::cout << "A::" << std::endl << A << std::endl;
mat B(m,k,arma::fill::randu);
//std::cout << "B::" << std::endl << B << std::endl;
mat AtA = A.t() * A;
mat AtB = A.t() * B;
// solve sequentially
mat Xseq = arma::solve(AtA,AtB,arma::solve_opts::likely_sympd);
int num_chunks = AtB.n_cols / chunk;
if (num_chunks * chunk < AtB.n_cols) num_chunks++;
mat Xchunk(n,arma::fill::zeros);
for (int nt = 0; nt < num_chunks; nt++) {
int spanStart = nt * chunk;
int spanEnd = (nt + 1) * chunk - 1;
if (spanEnd > AtB.n_cols - 1) {
spanEnd = AtB.n_cols - 1;
}
mat rhs = AtB.cols(spanStart,spanEnd);
mat Y = arma::solve(AtA,rhs,arma::solve_opts::likely_sympd);
Xchunk.cols(spanStart,spanEnd) = Y;
}
bool chkchunk = arma::approx_equal(Xseq,Xchunk,"absdiff",0.0001);
std::cout << "(Xseq == Xchunk) ? = " << chkchunk << std::endl;
if (!chkchunk) {
std::cout << "Xseq::" << std::endl << Xseq << std::endl;
std::cout << "Xchunk::" << std::endl << Xchunk << std::endl;
}
mat Xpar(n,arma::fill::zeros);
#pragma omp parallel for schedule(static,1)
for (int nt = 0; nt < num_chunks; nt++) {
int spanStart = nt * chunk;
int spanEnd = (nt + 1) * chunk - 1;
if (spanEnd > AtB.n_cols - 1) {
spanEnd = AtB.n_cols - 1;
}
mat rhs = AtB.cols(spanStart,arma::solve_opts::likely_sympd);
Xpar.cols(spanStart,spanEnd) = Y;
}
bool chkpar = arma::approx_equal(Xseq,Xpar,0.0001);
std::cout << "(Xseq == Xpar) ? = " << chkpar << std::endl;
if (!chkpar) {
std::cout << "Xseq::" << std::endl << Xseq << std::endl;
std::cout << "Xpar::" << std::endl << Xpar << std::endl;
}
return 0;
}
一个小的驱动程序脚本,希望可以重现我的错误。我的Armadillo实例与OpenBLAS链接在一起,我没有指定OMP_NUM_THREADS
变量。
#!/bin/bash
for i in {1..20}
do
echo $i
unset OMP_NUM_THREADS;
./parals.out 10 5 20 17 3
done
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。