如何解决广播或平铺numpy矩阵并应用求和
也许我只是忽略了一件非常琐碎的事情,但我似乎无法弄清楚。
我正在寻找最佳的(快速,numpythonic)方法来实现:
- 将两个矩阵(A + B)相加,
- 对矩阵中相同位置的值求和(A [0,0] + B [0,0]等)
- (隐式)展开B以匹配A的形状(例如
numpy.tile
)
平铺B,然后将其裁剪为A.shape
,可以得到预期的结果。但这似乎是:
- 慢而
- 非常详细
有关我的方法,替代方法及其性能的详细信息,如下所示。
# Matrix A
A = np.arange(20).reshape((4,5))
A
array([[ 0,1,2,3,4],[ 5,6,7,8,9],[10,11,12,13,14],[15,16,17,18,19]])
# Matrix B
B = np.arange(4).reshape((2,2))
B
array([[0,1],[2,3]])
# Matrix F
# Tile B as much as required to cover the shape of A
scale = np.ceil(np.divide(A.shape,B.shape)).astype(int)
F = np.tile(B,scale)
F
array([[0,3],[0,3]])
# Crop F to match the size of A
F = F[0:A.shape[0],0:A.shape[1]]
F
array([[0,0],2],2]])
# Sum A and B (tiled + cropped)
A + F
array([[ 0,4,[ 7,9,11],14,[17,19,21,21]])
以上方法转换为函数:
def expanded_sum(A,B):
scale = np.ceil(np.divide(A.shape,B.shape)).astype(int)
F = np.tile(B,scale)
F = F[0:A.shape[0],0:A.shape[1]]
return A+F
另一种方法。创建一个新矩阵,并用它迭代地填充
A[y,x] + B[y % w,x % w]
(其中w
和h
是B的宽度和高度)
我想如果numpy可以在内部执行此操作,而不是在python代码中执行此操作,则会更快。
def sum_mod_2D(A,B):
Bh,Bw = B.shape
res = np.zeros(A.shape)
for y,row in enumerate(A):
for x,v in enumerate(row):
res[y,x] = v + B[y%Bh,x%Bw]
return res
A = np.arange(20).reshape((4,5))
display(A)
B = np.arange(4).reshape((2,2))
display(B)
sum_mod_2D(A,B)
array([[ 0,19]])
array([[0,3]])
array([[ 0.,2.,4.,4.],[ 7.,9.,11.,11.],[10.,12.,14.,14.],[17.,19.,21.,21.]])
为了测试我的方法的性能,我使用了timeit并将结果与大小相等的矩阵求和进行比较。
我使用大小为NxN的矩阵,其中N = [10,100,1000]。
sizes = [10,100,1000]
import itertools as it
for s in sizes:
print("{0}x{0} + {1}x{1}".format(s,s))
A = np.random.randint(255,size=(s,s))
B = np.random.randint(255,s))
t1 = %timeit -o A+B
t2 = %timeit -o expanded_sum(A,B)
t3 = %timeit -o sum_mod_2D(A,B)
print("expanded_sum is {0:.0f}x slower".format(t2.average / t1.average))
print(" sum_mod_2D is {0:.0f}x slower".format(t3.average / t1.average))
print()
10x10 + 10x10
440 ns ± 14.6 ns per loop (mean ± std. dev. of 7 runs,1000000 loops each)
10.7 µs ± 163 ns per loop (mean ± std. dev. of 7 runs,100000 loops each)
52.6 µs ± 799 ns per loop (mean ± std. dev. of 7 runs,10000 loops each)
expanded_sum is 24x slower
sum_mod_2D is 119x slower
100x100 + 100x100
4.61 µs ± 26.5 ns per loop (mean ± std. dev. of 7 runs,100000 loops each)
20.2 µs ± 546 ns per loop (mean ± std. dev. of 7 runs,100000 loops each)
4.33 ms ± 31.5 µs per loop (mean ± std. dev. of 7 runs,100 loops each)
expanded_sum is 4x slower
sum_mod_2D is 940x slower
1000x1000 + 1000x1000
1.21 ms ± 16 µs per loop (mean ± std. dev. of 7 runs,1000 loops each)
5.18 ms ± 149 µs per loop (mean ± std. dev. of 7 runs,100 loops each)
441 ms ± 8.74 ms per loop (mean ± std. dev. of 7 runs,1 loop each)
expanded_sum is 4x slower
sum_mod_2D is 366x slower
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。