如何使用 xarray 按时间分组，然后在组上运行 bin 函数？

如何解决如何使用 xarray 按时间分组，然后在组上运行 bin 函数？

我有一个多维的“总海浪平均方向”（mdts），netCDF 数据集。维度为 time（以小时为单位）、latitude 和 longitude。我只是希望按天对每小时数据进行分组，然后对于每一天，对于每个纬度/经度网格，确定 16 个预定义的方向箱中哪一个包含最多的小时数（最多 24 个）。对于每个纬度/经度网格，与具有最多小时数的条柱关联的方向值随后将被分配为每个纬度/经度网格的特定日期的方向。我正在将自定义函数应用于 groupby 命令，这就是发生错误的地方。我想我不明白传递给函数的内容。

注意：每个 netCDF 文件代表 1979-2019 一个月。因此，我使用 groupby 而不是 resample，因为 resample 添加了文件中没有的其他 11 个月。我还首先将所有小时数转换为 00:00，以便 groupby 可以按天分组。

注意：我的实际代码设置为遍历多个 netCDF 文件。我已经将它简化为一个文件。我的简化代码：

import numpy as np
import xarray as xr
        
ifile = 'mean_direction_total_swell_Nov_1979_2019_hourly.nc'
        
# min,max,and center values of angle direction bins
min = [348.75,11.25,33.75,56.25,78.75,101.25,123.75,146.25,168.75,191.25,213.75,236.25,258.75,281.25,303.75,326.25]
max = [ 11.25,326.25,348.75]
dir = [   0.0,22.5,45.0,67.5,90.0,112.5,135.0,157.5,180.0,202.5,225.0,247.5,270.0,292.5,315.0,337.5]
    
# custom function that I think is causing the problem    
def bins(x):
    bins_n = np.zeros([16],dtype=int)
        
    # north bin requires 'or' statement
    if(x >= min[0] or x < max[0]): bins_n[0] = bins_n[0] + 1
        
    # other bins require 'and' statement
    for i in range(1,16,1): # bins
        if(x >= min[i] and x < max[i]):
            bins_n[i] = bins_n[i] + 1
            break
    slot = np.argmax(bins_n)
        
    return dir[slot]
    
   
idatanc = xr.open_dataset(ifile)              
idata = idatanc['mdts']                          
    
idata.coords['time'] = idata.time.dt.floor('1D') # setting all hourly values to 0000 
idata_dy = idata.groupby("time").apply(bins)

返回什么。注意：此错误基于多个 netCDF 文件的循环程序，因此它可能与上面的代码不完全对应。错误还是一样。

Traceback (most recent call last):

  File "<ipython-input-216-82adffe45690>",line 9,in <module>
    idata_dy = idata.groupby("time").apply(bins)

  File "C:\Users\TWHawk\Anaconda3\envs\tim_python36\lib\site-packages\xarray\core\groupby.py",line 815,in apply
    return self.map(func,shortcut=shortcut,args=args,**kwargs)

  File "C:\Users\TWHawk\Anaconda3\envs\tim_python36\lib\site-packages\xarray\core\groupby.py",line 800,in map
    return self._combine(applied,shortcut=shortcut)

  File "C:\Users\TWHawk\Anaconda3\envs\tim_python36\lib\site-packages\xarray\core\groupby.py",line 819,in _combine
    applied_example,applied = peek_at(applied)

  File "C:\Users\TWHawk\Anaconda3\envs\tim_python36\lib\site-packages\xarray\core\utils.py",line 183,in peek_at
    peek = next(gen)

  File "C:\Users\TWHawk\Anaconda3\envs\tim_python36\lib\site-packages\xarray\core\groupby.py",line 799,in <genexpr>
    applied = (maybe_wrap_array(arr,func(arr,*args,**kwargs)) for arr in grouped)

  File "<ipython-input-215-3d060f71ca15>",line 6,in bins
    if(x >= min[0] or x < max[0]): bins_n[0] = bins_n[0] + 1

  File "C:\Users\TWHawk\Anaconda3\envs\tim_python36\lib\site-packages\xarray\core\common.py",line 119,in __bool__
    return bool(self.values)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

解决方法

我没有一直检查结果，但我认为下面的代码可以满足您的需求：

import numpy as np
import xarray as xr
from scipy import stats

def func(x,axis):
    mode,count = np.apply_along_axis(stats.mode,axis,x)
    return mode.squeeze()

infile = 'mean_direction_total_swell_Nov_1979_2019_hourly.nc'

ds = xr.open_dataset(infile)

# make sure range is 0 <= x < 360
ds['mdts'] = np.mod(ds['mdts'],360)

# bin the data in 16 directions (17 actually,North appears as the first and
# last bin)
step = 360 / 16
centers = np.r_[np.r_[0: 360: step],0]
edges = np.r_[0,np.r_[step / 2: 360: step],360]

ds['mdts_binned_idx'] = (ds['mdts'].dims,np.digitize(ds['mdts'],edges))

ds['mdts_binned'] = (ds['mdts'].dims,centers[ds['mdts_binned_idx'] - 1])

# apply stats.mode to get the modal (most common) value in each day
ds2 = xr.Dataset()
ds2['mdts_mode_1d'] = ds['mdts_binned'].resample(time='1D').reduce(func)