如何解决numpy 可能表现不佳的原因是什么?
我一直遇到 numpy 的性能问题。我测试了计算点积的不同方法,但其他 numpy 函数也出现了类似的性能问题。
import timeit
import numpy as np
import tensorflow as tf
def np_dot(U,X):
tic = timeit.default_timer()
X_bar = np.dot(U.T,X)
toc = timeit.default_timer()
print("np.dot took {:.4f} s".format(toc-tic))
return X_bar
def np_at(U,X):
tic = timeit.default_timer()
X_bar = U.T @ X
toc = timeit.default_timer()
print("np @ took {:.4f} s".format(toc-tic))
return X_bar
def np_matmul(U,X):
tic = timeit.default_timer()
X_bar = np.matmul(U.T,X)
toc = timeit.default_timer()
print("np.matmul took {:.4f} s".format(toc-tic))
return X_bar
def np_einsum(U,X):
tic = timeit.default_timer()
X_bar = np.einsum('ij,jk',U.T,X)
toc = timeit.default_timer()
print("np.einsum took {:.4f} s".format(toc-tic))
return X_bar
def tf_matmul(U,X):
tic = timeit.default_timer()
X_bar = tf.matmul(U.T,X)
toc = timeit.default_timer()
print("tf.matmul took {:.4f} s".format(toc-tic))
return X_bar
if __name__ == "__main__":
print("is_gpu_available?",tf.test.is_gpu_available(),tf.test.is_built_with_cuda())
print("numpy version:",np.__version__)
print("tensorflow version:",tf.__version__)
M = N = 1000
X = np.random.rand(M,N)
U = np.random.rand(M,N)
X_bar = np_dot(U,X)
X_bar0 = np_at(U,X)
X_bar1 = np_matmul(U,X)
X_bar2 = np_einsum(U,X)
X_bar3 = tf_matmul(U,X)
print(np.allclose(X_bar,X_bar0))
print(np.allclose(X_bar,X_bar1))
print(np.allclose(X_bar,X_bar2))
print(np.allclose(X_bar,X_bar3))
我的笔记本电脑(使用 Windows 10 和 python 3.8)上的输出是:
is_gpu_available? False False
numpy version: 1.19.2
tensorflow version: 2.2.0
np.dot took 12.5368 s
np @ took 14.3303 s
np.matmul took 14.0634 s
np.einsum took 0.5937 s
tf.matmul took 0.0263 s
True
True
True
True
意味着标准的 @
比实际速度慢 545 倍!在朋友的笔记本电脑上,结果看起来更像预期的(使用 Ubuntu 和 python 3.6.9): >
is_gpu_available? False False
numpy version: 1.19.5
tensorflow version: 1.14.0
np.dot took 0.0192 s
np @ took 0.0196 s
np.matmul took 0.0208 s
np.einsum took 0.3893 s
tf.matmul took 0.0280 s
True
True
True
True
有趣的是,numpys einsum
似乎不受我笔记本电脑出现问题的影响,tensorflow 也不受此影响。
这里发生了什么? 造成如此巨大性能差异的潜在原因是什么?
编辑:
始终确保 numpy 正在使用某些 BLAS 库。在此处阅读更多相关信息:Boosting numpy: Why BLAS Matters。
就我而言,numpy.show_config()
返回以下内容:
blas_mkl_info:
NOT AVAILABLE
blis_info:
NOT AVAILABLE
openblas_info:
NOT AVAILABLE
atlas_3_10_blas_threads_info:
NOT AVAILABLE
atlas_3_10_blas_info:
NOT AVAILABLE
atlas_blas_threads_info:
NOT AVAILABLE
atlas_blas_info:
NOT AVAILABLE
accelerate_info:
NOT AVAILABLE
blas_info:
NOT AVAILABLE
blas_src_info:
NOT AVAILABLE
blas_opt_info:
NOT AVAILABLE
lapack_mkl_info:
NOT AVAILABLE
openblas_lapack_info:
NOT AVAILABLE
openblas_clapack_info:
NOT AVAILABLE
flame_info:
NOT AVAILABLE
atlas_3_10_threads_info:
NOT AVAILABLE
atlas_3_10_info:
NOT AVAILABLE
atlas_threads_info:
NOT AVAILABLE
atlas_info:
NOT AVAILABLE
lapack_info:
NOT AVAILABLE
lapack_src_info:
NOT AVAILABLE
lapack_opt_info:
NOT AVAILABLE
numpy_linalg_lapack_lite:
language = c
define_macros = [('HAVE_BLAS_ILP64',None),('BLAS_SYMBOL_SUFFIX','64_')]
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。