在 numpy - 编程之家

如何解决在 numpy

我有一个二维数组，我在其中使用 ndimage.label() 函数标记集群，如下所示：

import numpy as np
from scipy.ndimage import label

input_array = np.array([[0,1,0],[1,[0,1],1]])

labeled_array,_ = label(input_array)

# Result:
# labeled_array == [[0,#                   [1,#                   [0,2],2]]

我可以获得元素计数、质心或标记集群的边界框。但我还想获得集群中每个元素的坐标。像这样（数据结构不一定是这样，任何数据结构都可以）：

{
    1: [(0,1),(0,2),(1,0),1)],# Coordinates of the elements that have the label "1"
    2: [(2,3),(3,3)]  # Coordinates of the elements that have the label "2"
}

我可以遍历标签列表并为每个标签调用 np.where()，但我想知道是否有一种方法可以在没有循环的情况下执行此操作，以便更快？

解决方法

您可以制作坐标图，对其进行排序和拆分：

# Get the indexes (coordinates) of the labeled (non-zero) elements
ind = np.argwhere(labeled_array)

# Get the labels corresponding to those indexes above
labels = labeled_array[tuple(ind.T)]

# Sort both arrays so that lower label numbers appear before higher label numbers. This is not for cosmetic reasons,# but we will use sorted nature of these label indexes when we use the "diff" method in the next step.
sort = labels.argsort()
ind = ind[sort]
labels = labels[sort]

# Find the split points where a new label number starts in the ordered label numbers
splits = np.flatnonzero(np.diff(labels)) + 1

# Create a data structure out of the label numbers and indexes (coordinates).
# The first argument to the zip is: we take the 0th label number and the label numbers at the split points
# The second argument is the indexes (coordinates),split at split points
# so the length of both arguments to the zip function is the same
result = {k: v for k,v in zip(labels[np.r_[0,splits]],np.split(ind,splits))}

方法一：

你可以试试这个，仍然使用字典理解循环：

{k: list(zip(*np.where(labeled_array == k))) for k in range(1,3)}

输出：

{1: [(0,1),(0,2),(1,0),1)],2: [(2,3),(3,3)]}

方法二（慢）：

这里有一种使用 Pandas 的方法可能比 Mad Physicist 的方法慢：

(pd.DataFrame(labeled_array)
  .stack() 
  .reset_index()
  .groupby(0).agg(list)[1:]
  .apply(lambda x: list(zip(*x)),axis=1)
).to_dict()

输出：

{1: [(0,3)]}

使用此数据的时间：

字典理解

每个循环 8.73 µs ± 216 ns（平均值 ± 标准偏差，7 次运行，每次 100000 次循环）

使用地图坐标，排序和拆分：

每个循环 57.3 µs ± 5.55 µs（7 次运行的平均值 ± 标准偏差，每次 10000 次循环）

熊猫

每个循环 5.16 ms ± 283 µs（平均值 ± 标准偏差，7 次运行，每次 100 次循环）