微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

numpy 可能表现不佳的原因是什么?

如何解决numpy 可能表现不佳的原因是什么?

我一直遇到 numpy 的性能问题。我测试了计算点积的不同方法,但其他 numpy 函数也出现了类似的性能问题。

import timeit
import numpy as np
import tensorflow as tf

def np_dot(U,X):
    tic = timeit.default_timer()
    X_bar = np.dot(U.T,X)
    toc = timeit.default_timer()
    print("np.dot took {:.4f} s".format(toc-tic))
    return X_bar

def np_at(U,X):
    tic = timeit.default_timer()
    X_bar = U.T @ X
    toc = timeit.default_timer()
    print("np @ took {:.4f} s".format(toc-tic))
    return X_bar

def np_matmul(U,X):
    tic = timeit.default_timer()
    X_bar = np.matmul(U.T,X)
    toc = timeit.default_timer()
    print("np.matmul took {:.4f} s".format(toc-tic))
    return X_bar

def np_einsum(U,X):
    tic = timeit.default_timer()
    X_bar = np.einsum('ij,jk',U.T,X)
    toc = timeit.default_timer()
    print("np.einsum took {:.4f} s".format(toc-tic))
    return X_bar

def tf_matmul(U,X):
    tic = timeit.default_timer()
    X_bar = tf.matmul(U.T,X)
    toc = timeit.default_timer()
    print("tf.matmul took {:.4f} s".format(toc-tic))
    return X_bar


if __name__ == "__main__":
    print("is_gpu_available?",tf.test.is_gpu_available(),tf.test.is_built_with_cuda())
    print("numpy version:",np.__version__)
    print("tensorflow version:",tf.__version__)
    
    M = N = 1000
    X = np.random.rand(M,N)
    U = np.random.rand(M,N)
    X_bar = np_dot(U,X)
    X_bar0 = np_at(U,X)
    X_bar1 = np_matmul(U,X)
    X_bar2 = np_einsum(U,X)
    X_bar3 = tf_matmul(U,X)
    print(np.allclose(X_bar,X_bar0))
    print(np.allclose(X_bar,X_bar1))
    print(np.allclose(X_bar,X_bar2))
    print(np.allclose(X_bar,X_bar3))

我的笔记本电脑(使用 Windows 10 和 python 3.8)上的输出是:

is_gpu_available? False False
numpy version: 1.19.2
tensorflow version: 2.2.0
np.dot took 12.5368 s
np @ took 14.3303 s
np.matmul took 14.0634 s
np.einsum took 0.5937 s
tf.matmul took 0.0263 s
True
True
True
True

意味着标准的 @ 比实际速度慢 545 倍!在朋友的笔记本电脑上,结果看起来更像预期的(使用 Ubuntu 和 python 3.6.9): >

is_gpu_available? False False
numpy version: 1.19.5
tensorflow version: 1.14.0
np.dot took 0.0192 s
np @ took 0.0196 s
np.matmul took 0.0208 s
np.einsum took 0.3893 s
tf.matmul took 0.0280 s
True
True
True
True

有趣的是,numpys einsum 似乎不受我笔记本电脑出现问题的影响,tensorflow 也不受此影响。

这里发生了什么? 造成如此巨大性能差异的潜在原因是什么?

编辑:

始终确保 numpy 正在使用某些 BLAS 库。在此处阅读更多相关信息:Boosting numpy: Why BLAS Matters

就我而言,numpy.show_config() 返回以下内容

blas_mkl_info:
  NOT AVAILABLE
blis_info:
  NOT AVAILABLE
openblas_info:
  NOT AVAILABLE
atlas_3_10_blas_threads_info:
  NOT AVAILABLE
atlas_3_10_blas_info:
  NOT AVAILABLE
atlas_blas_threads_info:
  NOT AVAILABLE
atlas_blas_info:
  NOT AVAILABLE
accelerate_info:
  NOT AVAILABLE
blas_info:
  NOT AVAILABLE
blas_src_info:
  NOT AVAILABLE
blas_opt_info:
  NOT AVAILABLE
lapack_mkl_info:
  NOT AVAILABLE
openblas_lapack_info:
  NOT AVAILABLE
openblas_clapack_info:
  NOT AVAILABLE
flame_info:
  NOT AVAILABLE
atlas_3_10_threads_info:
  NOT AVAILABLE
atlas_3_10_info:
  NOT AVAILABLE
atlas_threads_info:
  NOT AVAILABLE
atlas_info:
  NOT AVAILABLE
lapack_info:
  NOT AVAILABLE
lapack_src_info:
  NOT AVAILABLE
lapack_opt_info:
  NOT AVAILABLE
numpy_linalg_lapack_lite:
    language = c
    define_macros = [('HAVE_BLAS_ILP64',None),('BLAS_SYMBOL_SUFFIX','64_')]

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其他元素将获得点击?
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。)
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbcDriver发生异常。为什么?
这是用Java进行XML解析的最佳库。
Java的PriorityQueue的内置迭代器不会以任何特定顺序遍历数据结构。为什么?
如何在Java中聆听按键时移动图像。
Java“Program to an interface”。这是什么意思?