微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

在NVBLAS中使用jBLAS

如何解决在NVBLAS中使用jBLAS

由于配置脚本无法正确找到库,因此我用NVBLAS编译了jBLAS,但解决方案有些拙劣。我像这样手动编辑了jBLAS的configure.out文件,以包含NVBLAS库。

BUILD_TYPE=nvblas
CC=gcc
CCC=c99
CFLAGS=-fPIC -DHAS_cpuID
F77=gfortran
FOUND_JAVA=true
FOUND_NM=true
INCDirs=-Iinclude -I/usr/lib/jvm/java-11-openjdk-amd64/include -I/usr/lib/jvm/java-11-openjdk-amd64/include/linux
JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
LAPACK_HOME=./lapack-lite-3.1.1
LD=gcc
LDFLAGS=-shared
LIB=lib
LINKAGE_TYPE=static
LOADLIBES=-Wl,-z,muldefs /home/linyi/jblas/lapack-lite-3.1.1/lapack_LINUX.a /usr/local/cuda-11.0/lib64/libnvblas.so.11 /home/linyi/jblas/lapack-lite-3.1.1/blas_LINUX.a -lgfortran
MAKE=make
NM=nm
OS_ARCH=amd64
OS_ARCH_WITH_FLAVOR=amd64/sse3
OS_NAME=Linux
RUBY=ruby
SO=so

然后我运行了here中记录的命令make clean allmvn clean package。测试成功通过,但程序在退出时导致分段错误

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.jblas.TestEigen
[NVBLAS] NVBLAS_CONfig_FILE environment variable is set to '/home/linyi/nvblas.conf'
Tests run: 2,Failures: 0,Errors: 0,Skipped: 0,Time elapsed: 0.569 sec
Running org.jblas.TestComplexFloat
Tests run: 5,Time elapsed: 0.001 sec
Running org.jblas.TestDecompose
Tests run: 7,Time elapsed: 0.004 sec
Running org.jblas.TestBlasDouble
Tests run: 8,Time elapsed: 0.003 sec
Running org.jblas.TestBlasDoubleComplex
Tests run: 3,Time elapsed: 0.001 sec
Running org.jblas.TestSingular
Tests run: 2,Time elapsed: 0.004 sec
Running org.jblas.TestDoubleMatrix
Tests run: 37,Time elapsed: 0.022 sec
Running org.jblas.TestSolve
Tests run: 5,Time elapsed: 0.001 sec
Running org.jblas.TestBlasFloat
Tests run: 8,Time elapsed: 0.001 sec
Running org.jblas.TestFloatMatrix
Tests run: 37,Time elapsed: 0.012 sec
Running org.jblas.SimpleBlasTest
Tests run: 1,Time elapsed: 0 sec
Running org.jblas.ranges.RangeTest
Tests run: 4,Time elapsed: 0.002 sec
Running org.jblas.TestGeometry
Tests run: 2,Time elapsed: 0.001 sec
Running org.jblas.ComplexDoubleMatrixTest
Tests run: 1,Time elapsed: 0 sec
-- org.jblas INFO Deleting /tmp/jblas4383455253907276334/libjblas.so
-- org.jblas INFO Deleting /tmp/jblas4383455253907276334/libjblas_arch_flavor.so
-- org.jblas INFO Deleting /tmp/jblas4383455253907276334
#
# A Fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fa1f6bb96b1,pid=8063,tid=8072
#
# JRE version: OpenJDK Runtime Environment (11.0.8+10) (build 11.0.8+10-post-Ubuntu-0ubuntu118.04.1)
# Java VM: OpenJDK 64-Bit Server VM (11.0.8+10-post-Ubuntu-0ubuntu118.04.1,mixed mode,sharing,tiered,compressed oops,g1 gc,linux-amd64)
# Problematic frame:
# C  [libcublas.so.11+0xa096b1]
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/linyi/jblas/jblas/core.8063)
#
# An error report file with more information is saved as:
# /home/linyi/jblas/jblas/hs_err_pid8063.log
#
# If you would like to submit a bug report,please visit:
#   https://bugs.launchpad.net/ubuntu/+source/openjdk-lts
#
Aborted (core dumped)

Results :

Tests run: 122,Skipped: 0

我决定运行mvn clean package -DskipTests,因为测试似乎通过得很好,只是程序在终止时导致了分段错误。但是,当我在Java项目中使用该库时,nvblas.log揭示了尽管NVBLAS拦截了对BLAS例程的调用,但实际上它们是在cpu而非GPU上执行的。在我的程序中运行nvprof --print-gpu-summary也会得出相同的结论。

#
==7711== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:  100.00%  1.8240us         1  1.8240us  1.8240us  1.8240us  [CUDA memcpy HtoD]
======== Error: Application received signal 134

nvblas.log内容如下:

[NVBLAS] Using devices :0
[NVBLAS] Config parsed
[NVBLAS] dgemm[cpu]: ta=N,tb=N,m=1,n=1,k=1
[NVBLAS] dsyr2k[cpu]: up=U,ta=N,n=24,k=28
[NVBLAS] dsyr2k[cpu]: up=U,n=32,n=26,n=22,n=20,k=28
[NVBLAS] dtrmm[cpu]: si=R,up=U,di=U,m=52,n=31
[NVBLAS] dtrmm[cpu]: si=R,m=54,up=L,ta=T,di=N,m=60,n=31
[NVBLAS] dsyr2k[cpu]: up=U,n=22
[NVBLAS] dgemm[cpu]: ta=T,k=31
[NVBLAS] dtrmm[cpu]: si=R,n=28
[NVBLAS] dtrmm[cpu]: si=R,n=20
[NVBLAS] dtrmm[cpu]: si=R,n=28,k=31
[NVBLAS] dgemm[cpu]: ta=T,k=31
[NVBLAS] dgemm[cpu]: ta=N,tb=T,m=31,n=54,k=22
. . .

我真的无所适从,我希望有人能提供任何建议,这似乎真的没有什么证据。

解决方法

我相信我已经弄清楚了。

nm libjblas.so | grep -is dgemm
         U cblas_dgemm
         U dgemm_@@libnvblas.so.11

这表明库确实在正确链接。然后,我通过运行java -jar jblas.jar(其中jblas.jar是已编译的库)来运行jBlas的内置基准测试,并且显然GPU卸载仅发生在大型矩阵上,因为{{ 1}}或n=10,但n=100nvblas.log记录了GPU计算。这对我来说很困惑,我希望这可以帮助其他在此问题上苦苦挣扎的人。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。