广播或平铺numpy矩阵并应用求和

如何解决广播或平铺numpy矩阵并应用求和

也许我只是忽略了一件非常琐碎的事情，但我似乎无法弄清楚。

我正在寻找最佳的（快速，numpythonic）方法来实现：

将两个矩阵（A + B）相加，
对矩阵中相同位置的值求和（A [0,0] + B [0,0]等）
（隐式）展开B以匹配A的形状（例如numpy.tile）

平铺B，然后将其裁剪为A.shape，可以得到预期的结果。但这似乎是：

慢而
非常详细

有关我的方法，替代方法及其性能的详细信息，如下所示。

# Matrix A
A = np.arange(20).reshape((4,5))
A

array([[ 0,1,2,3,4],[ 5,6,7,8,9],[10,11,12,13,14],[15,16,17,18,19]])

# Matrix B
B = np.arange(4).reshape((2,2))
B

array([[0,1],[2,3]])

# Matrix F
# Tile B as much as required to cover the shape of A
scale = np.ceil(np.divide(A.shape,B.shape)).astype(int)
F = np.tile(B,scale)
F

array([[0,3],[0,3]])

# Crop F to match the size of A
F = F[0:A.shape[0],0:A.shape[1]]
F

array([[0,0],2],2]])

# Sum A and B (tiled + cropped)
A + F

array([[ 0,4,[ 7,9,11],14,[17,19,21,21]])

以上方法转换为函数：

def expanded_sum(A,B):
    scale = np.ceil(np.divide(A.shape,B.shape)).astype(int)
    F = np.tile(B,scale)
    F = F[0:A.shape[0],0:A.shape[1]]
    return A+F

另一种方法。创建一个新矩阵，并用它迭代地填充 A[y,x] + B[y % w,x % w] （其中w和h是B的宽度和高度）

我想如果numpy可以在内部执行此操作，而不是在python代码中执行此操作，则会更快。

def sum_mod_2D(A,B):
    Bh,Bw = B.shape
    res = np.zeros(A.shape)
    for y,row in enumerate(A):
        for x,v in enumerate(row):
            res[y,x] = v + B[y%Bh,x%Bw]
    return res

A = np.arange(20).reshape((4,5))
display(A)
B = np.arange(4).reshape((2,2))
display(B)
sum_mod_2D(A,B)

array([[ 0,19]])

array([[0,3]])

array([[ 0.,2.,4.,4.],[ 7.,9.,11.,11.],[10.,12.,14.,14.],[17.,19.,21.,21.]])

为了测试我的方法的性能，我使用了timeit并将结果与大小相等的矩阵求和进行比较。

我使用大小为NxN的矩阵，其中N = [10，100，1000]。

sizes = [10,100,1000]

import itertools as it

for s in sizes:
    print("{0}x{0} + {1}x{1}".format(s,s))
    A = np.random.randint(255,size=(s,s))
    B = np.random.randint(255,s))
    
    t1 = %timeit -o A+B
    t2 = %timeit -o expanded_sum(A,B)
    t3 = %timeit -o sum_mod_2D(A,B)
    
    print("expanded_sum is {0:.0f}x slower".format(t2.average / t1.average))
    print("  sum_mod_2D is {0:.0f}x slower".format(t3.average / t1.average))
    print()

10x10 + 10x10
440 ns ± 14.6 ns per loop (mean ± std. dev. of 7 runs,1000000 loops each)
10.7 µs ± 163 ns per loop (mean ± std. dev. of 7 runs,100000 loops each)
52.6 µs ± 799 ns per loop (mean ± std. dev. of 7 runs,10000 loops each)
expanded_sum is 24x slower
  sum_mod_2D is 119x slower

100x100 + 100x100
4.61 µs ± 26.5 ns per loop (mean ± std. dev. of 7 runs,100000 loops each)
20.2 µs ± 546 ns per loop (mean ± std. dev. of 7 runs,100000 loops each)
4.33 ms ± 31.5 µs per loop (mean ± std. dev. of 7 runs,100 loops each)
expanded_sum is 4x slower
  sum_mod_2D is 940x slower

1000x1000 + 1000x1000
1.21 ms ± 16 µs per loop (mean ± std. dev. of 7 runs,1000 loops each)
5.18 ms ± 149 µs per loop (mean ± std. dev. of 7 runs,100 loops each)
441 ms ± 8.74 ms per loop (mean ± std. dev. of 7 runs,1 loop each)
expanded_sum is 4x slower
  sum_mod_2D is 366x slower

广播或平铺numpy矩阵并应用求和

如何解决广播或平铺numpy矩阵并应用求和

相关推荐