Python - 函数向量比在每个元素上调用 for 循环慢 Numpy 比原生 Python 慢

如何解决Python - 函数向量比在每个元素上调用 for 循环慢 Numpy 比原生 Python 慢

我正在编写代码以执行瞬态模拟，在一天中的每一秒乃至全年调用一个函数。我以为我找到了通过传递输入向量而不是调用 for 循环来加速代码的机会，但是，当我这样做时，我的代码运行速度较慢，我不明白为什么。

我希望向量只比一次调用 for 循环稍微慢一点，以实现我的目标加速。

你能帮忙解释和/或解决这个问题吗？我在下面的示例中展示了三种方法，它们是对较大程序的简化。

函数内部是当前设置为零的 Eg 变量。如果我执行 Eg=float(0) 或 Eg=np.array([0,0])，代码运行速度较慢，我认为这与更大的问题是相同的问题。

下面代码的结果是：

Execution time for numpy vector is 716.225 ms
Execution time for 'for-loop' 6 calls is 389.87 ms
Execution time for numpy float32 'for-loop' is 3906.9069999999997 ms

代码示例：

from datetime import datetime,timedelta
import numpy as np

def Q_Walls( A,R1,R2,R3,R4,m,cp,T_inf_inside,T_inf_outside,dt,T_p):
    
    Ein = (T_inf_outside - T_p) * A / (R2/2 + R3 + R4) # convection and conduction only
    Eout = (T_p - T_inf_inside) * A / (R1 + R2/2) # convection and conduction only
    Eg = 0
    Enet = Eg + Ein - Eout
    T_p1 = (Enet * dt / (m * cp) + T_p) # average bulk temperature of wall after time dt
    T2_surf = (T_p - Eout * R2/2 / A)
    
    return T_p1,Eout,T2_surf

def Q_Walls_vect( A,T_p):
    
    Ein = (T_inf_outside - T_p) * A / (R2/2 + R3 + R4) # convection and conduction only
    Eout = (T_p - T_inf_inside) * A / (R1 + R2/2) # convection and conduction only
    Eg = 0 #np.array([0,0],'float64')
    Enet = Eg + Ein - Eout
    T_p1 = (Enet * dt / (m * cp) + T_p) # average bulk temperature of wall after time dt
    T2_surf = (T_p - Eout * R2/2 / A)
    
    return T_p1,T2_surf



A= R1= R2= R3= R4= m= cp= np.array([1,1,1],'float32')
dt= np.array([1,'float32')
T_inf_inside = np.array([250,250,250],'float32')
T_inf_outside = np.array([250.2,250.2,250.2],'float32')
T_p_wall = np.array([250.1,250.1,250.1],'float32')

t_max =87000

begin_time = datetime.Now()


for x in np.arange(t_max):
    T_p_wall,Enet_wall,Tinside_surf = Q_Walls_vect(A,T_p_wall)
    
end_time = (datetime.Now() - begin_time)
print(f"Execution time for numpy vector is {end_time.total_seconds()*1000} ms")

A= R1= R2= R3= R4= m= cp= float(1.1)
dt= float(1)
T_inf_inside = float(250.01)
T_p_wall = float(250.1)
T_inf_outside = float(250.2)

begin_time = datetime.Now()


for x in np.arange(t_max):
    for j in range(6):
        T_p_wall,Tinside_surf = Q_Walls(A,T_p_wall)
    
end_time = (datetime.Now() - begin_time)
  
print(f"Execution time for 'for-loop' 6 calls is {end_time.total_seconds()*1000} ms")


A= R1= R2= R3= R4= m= cp= np.float32(1.1)
dt= 1
T_inf_inside = np.float32(250.01)
T_p_wall = np.float32(250.1)
T_inf_outside = np.float32(250.2)

begin_time = datetime.Now()


for x in np.arange(t_max):
    for j in range(6):
        T_p_wall,T_p_wall)
    
end_time = (datetime.Now() - begin_time)
print(f"Execution time for numpy float32 'for-loop' is {end_time.total_seconds()*1000} ms")

解决方法

代码中出现多个问题：

Numpy 对于大数组来说相当快，但对于非常小的数组则不然，因为创建/分配/释放 临时数组 很昂贵，而且从 Python 调用原生 Numpy 函数口译员。
整数类型和 float32 类型的变量在执行以下二元运算时提升为 float64：[int] BIN_OP [float32] 和 [float32] BIN_OP [float64] 并以相反的顺序执行。这会导致创建更多临时数组并进行多次隐式转换，从而使代码速度明显变慢。
CPython 循环非常慢，因为 CPython 是一个解释器。

可以使用以下示例代码修复第二点：

f32_const_0 = np.float32(0)
f32_const_2 = np.float32(2)

def Q_Walls_float32( A,R1,R2,R3,R4,m,cp,T_inf_inside,T_inf_outside,dt,T_p):
    Ein = (T_inf_outside - T_p) * A / (R2/f32_const_2 + R3 + R4) # convection and conduction only
    Eout = (T_p - T_inf_inside) * A / (R1 + R2/f32_const_2) # convection and conduction only
    Eg = f32_const_0
    Enet = Eg + Ein - Eout
    T_p1 = (Enet * dt / (m * cp) + T_p) # average bulk temperature of wall after time dt
    T2_surf = (T_p - Eout * R2/f32_const_2 / A)
    
    return T_p1,Eout,T2_surf

您可以使用 Numba（或 Cython）来降低成本，但最好不要仅对少数元素使用 Numpy 数组，或者实际上直接在 Numba 中按元素进行计算，以便没有创建很多临时数组。

以下是 Numba 代码示例：

from datetime import datetime,timedelta
import numpy as np
import numba as nb

A= R1= R2= R3= R4= m= cp= float(1.1)
dt= float(1)
T_inf_inside = float(250.01)
T_p_wall = float(250.1)
T_inf_outside = float(250.2)

@nb.njit(nb.types.UniTuple(nb.float64,3)(nb.float64,nb.float64,nb.float64))
def Q_Walls( A,T_p):
    Ein = (T_inf_outside - T_p) * A / (R2/2 + R3 + R4) # convection and conduction only
    Eout = (T_p - T_inf_inside) * A / (R1 + R2/2) # convection and conduction only
    Eg = 0
    Enet = Eg + Ein - Eout
    T_p1 = (Enet * dt / (m * cp) + T_p) # average bulk temperature of wall after time dt
    T2_surf = (T_p - Eout * R2/2 / A)
    
    return (T_p1,T2_surf)

@nb.njit(nb.types.UniTuple(nb.float64,nb.float64))
def compute_with_numba(A,T_p_wall):
    for x in np.arange(t_max):
        for j in range(6):
            T_p_wall,Enet_wall,Tinside_surf = Q_Walls(A,T_p_wall)
    return (T_p_wall,Tinside_surf)

begin_time = datetime.now()

T_p_wall,Tinside_surf = compute_with_numba(A,T_p_wall)
    
end_time = (datetime.now() - begin_time)
  
print(f"Execution time for 'for-loop' 6 calls is {end_time.total_seconds()*1000} ms")

这是我机器上的计时结果：

Initial execution:

Execution time for numpy vector is 758.232 ms
Execution time for 'for-loop' 6 calls is 256.093 ms
Execution time for numpy float32 'for-loop' is 3768.253 ms

----------

Fixed execution (Q_Walls_float32):

Execution time for numpy float32 'for-loop' is 839.016 ms

----------

With Numba (compute_with_numba):

Execution time for 'for-loop' 6 calls is 6.311 ms