如何解决仅使用 2 个内核性能更差
我做了一个程序,计算一定范围内的阿姆斯特朗数,问题是性能比串行实现只在使用2个处理器时差
这是序列号:
import sys
import random
from time import perf_counter as pc
import datetime
import time
ARRAYSIZE = int(sys.argv[1])
numbers = [i for i in range(1,ARRAYSIZE+1)]
random.shuffle(numbers)
start_time = time.time()
armstrong = []
for i in numbers:
num = i
result = 0
n = len(str(i))
while(i != 0):
digit = i % 10
result += digit**n
i //= 10
if num == result:
armstrong.append(num)
armstrong.sort()
elapsed_time = (time.time() - start_time)
print(f"Serial time (Shuffle): {elapsed_time}")
基本上,它会检查数组中的每个数字是否为 Armstrong 数字。
这是并行代码:
from mpi4py import MPI
import random
import sys
import timeit
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
MASTER = 0
WORKERS = size - 1
ARRAYSIZE = int(sys.argv[1])
CHUNKSIZE = ARRAYSIZE//WORKERS
if rank == MASTER:
master_time = timeit.default_timer()
array = []
for i in range(ARRAYSIZE):
array.append(i+1)
random.shuffle(array)
for i in range(WORKERS):
sub_array = []
for j in range(CHUNKSIZE):
sub_array.append(array[j])
comm.send(sub_array,dest=(i+1),tag=1)
array = array[CHUNKSIZE:len(array)]
armstrong = []
for i in array:
num = i
result = 0
n = len(str(num))
while(i != 0):
digit = i % 10
result += digit**n
i //= 10
if num == result:
armstrong.extend(num)
for i in range(WORKERS):
get_armstrong_numbers = comm.recv(tag=2)
armstrong.extend(get_armstrong_numbers)
armstrong.sort()
print(f'Master Time (Shuffle): {timeit.default_timer() - master_time}')
elif rank != MASTER:
worker_time = timeit.default_timer()
receive = comm.recv(source=0,tag=1)
armstrong_numbers = []
for i in range(CHUNKSIZE):
num = receive[i]
result = 0
n = len(str(num))
while(receive[i] != 0):
digit = receive[i] % 10
result += digit**n
receive[i] //= 10
if num == result:
armstrong_numbers.append(num)
comm.send(armstrong_numbers,dest=0,tag=2)
print(f'Worker time (Shuffle): {timeit.default_timer() - worker_time}')
并行程序所做的就是将初始数组分成几部分分配给各个worker,一部分分配给master。完成后,worker 将他们的部分发送给 master,master 进行合并、排序和打印结果。
这里有一些时间:(我只是复制并粘贴了我的终端)
PS C:\Users\danil\Desktop\Armstrong Numbers\2 Cores - Intermediate Times> python .\serial_shuffle.py 10000000
Serial time (Shuffle): 27.026000261306763
PS C:\Users\danil\Desktop\Armstrong Numbers\2 Cores - Intermediate Times> mpiexec -n 2 python .\Shuffle.py 10000000
Master Time (Shuffle): 39.5854112
PS C:\Users\danil\Desktop\Armstrong Numbers\2 Cores - Intermediate Times> mpiexec -n 3 python .\Shuffle.py 10000000
Worker time (Shuffle): 23.1734209
Worker time (Shuffle): 25.0732174
Master Time (Shuffle): 25.073847100000002
Worker time (Shuffle): 17.7299418
Worker time (Shuffle): 19.547413000000002
Worker time (Shuffle): 20.8562403
Master Time (Shuffle): 20.8567713
PS C:\Users\danil\Desktop\Armstrong Numbers\2 Cores - Intermediate Times> mpiexec -n 5 python .\Shuffle.py 10000000
Worker time (Shuffle): 14.8727631
Worker time (Shuffle): 16.3137905
Worker time (Shuffle): 17.071894099999998
Worker time (Shuffle): 18.0240944
Master Time (Shuffle): 18.0242909
PS C:\Users\danil\Desktop\Armstrong Numbers\2 Cores - Intermediate Times> mpiexec -n 6 python .\Shuffle.py 10000000
Worker time (Shuffle): 13.0210344
Worker time (Shuffle): 14.223855599999998
Worker time (Shuffle): 15.3327299
Worker time (Shuffle): 16.0194109
Worker time (Shuffle): 16.786164199999998
Master Time (Shuffle): 16.785847299999997
几点:
- 排序的复杂性:无关紧要,如果我不洗牌(因此)不对数组进行排序,我也会有同样的行为
- 如您所见,这只发生在 -n 2
- 对于所有类型的数字都具有相同的行为:从 1000 到 250000000
- 我知道 I/O 操作(如打印)是并行化的阻碍
- 我使用相同的方法检查阿姆斯壮数
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。