如何解决优化 zarr 数组处理
我有一个包含 80 个 5-D zarr 文件的列表(mylist),其结构如下(T、F、B、Az、El)。数组的形状为 [24x4096x2016x24x8]。
def GetPolarData(mylist,freq,FreqLo,FreqHi):
'''
This function will take the list of zarr files (T,F,B,Az,El),open them,used selected frequency to return an array
of files with Azimuth and Elevation probabilities
'''
ChanIndx = FreqCut(FreqLo,FreqHi,freq)
if len(ChanIndx) != 0:
MyData = []
for i in range(len(mylist)):
print('Adding file {} : {}'.format(i,mylist[i][32:]))
try:
zarrf = xr.open_zarr(mylist[i],group = 'arr')
m = zarrf.master.sum(dim = ['time','baseline'])
m = m[ChanIndx].sum(dim = ['frequency'])
c = zarrf.counter.sum(dim = ['time','baseline'])
c = c[ChanIndx].sum(dim = ['frequency'])
p = m.astype(float)/c.astype(float)
MyData.append(p)
except Exception as e:
print(e)
continue
else:
print("Something went wrong in Frequency selection")
print("##########################################")
print("This will be contribution to selected band")
print("##########################################")
print(f"Min {np.nanmin(MyData)*100:.3f}% ")
print(f"Max {np.nanmax(MyData)*100:.3f}% ")
print(f"Average {np.nanmean(MyData)*100:.3f}% ")
return(MyData)
FreqLo = 470.
FreqHi = 854.
MyTVData =np.array(GetPolarData(AllZarrList,Freq,FreqHi))
我发现在 40 核、256 GB RAM 上运行需要很长时间(超过 3 小时)
有没有办法让它运行得更快?
谢谢
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。