Question

我有一个庞大的3D数组要处理。我想以以下方式重新标记元素

import numpy as np
given_array = np.array([1, 1, 1, 3, 3, 5, 5, 5, 8, 8, 8, 8, 8, 23, 23, 23])
required_array = np.array([0, 0, 0, 1, 1,  2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4])

我知道relabel_sequential中有一种skimage.segmentation方法，但是出于我的目的，它很慢。任何想法的快速方式将不胜感激。

Answer 1

最快的方法应该是编写一个特定的numba函数，以适合您的需要。

示例

from numba import njit
import numpy as np

@njit()
def relabel(array):
    i = 0
    n = -1
    previous = 0
    while i < len(array):
        if previous != array[i]:
            previous  = array[i]
            n += 1
        array[i] = n
        i += 1

given_array = np.array([1, 1, 1, 3, 3, 5, 5, 5, 8, 8, 8, 8, 8, 23, 23, 23])
relabel(given_array)

given_array

输出

array([0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4])

此示例对输入进行了很多假设，即对数组进行排序，第一个数字为正，它是一维形状，要覆盖数组。

Answer 2

尝试一下，看看它是否足够快。将inverse返回的numpy.unique与参数return_inverse=True一起使用：

In [52]: given_array = np.array([1, 1, 1, 3, 3, 5, 5, 5, 8, 8, 8, 8, 8, 23, 23, 23])             

In [53]: u, inv = np.unique(given_array, return_inverse=True)                                    

In [54]: inv                                                                                     
Out[54]: array([0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4])

Answer 3

如果给定数组未排序，则比排序数组要快：

from numba import njit
import numpy as np

@njit()
def relabel_fast(array, count):
    i = 0
    while i < len(array):
        data = array[i]
        count[data] += 1
        i += 1
    a = 1 # Position in count
    b = 0 # Position in array
    c = 0 # The current output number
    while a < len(count):
        d = 0 # The number of 'c' to output
        if count[a] > 0:
            while d < count[a]:
                array[b] = c
                b += 1
                d += 1
            c += 1
        a += 1

def relabel(given_array):
    # Arrays cannot be created within Numba, so create the count array before calling the Numba function
    count = np.zeros(np.max(given_array) + 1, dtype=int)
    relabel_fast(given_array, count)


#given_array = np.array([1, 1, 1, 3, 3, 5, 5, 5, 8, 8, 8, 8, 8, 23, 23, 23])
given_array = np.array([1, 23, 1, 3, 8, 3, 5, 5, 8, 8, 8, 5, 8, 23, 23, 1])
relabel(given_array)

given_array

输出

array([0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4])

在python中重新标记数组元素或使元素连续的快速方法

3 个答案:

示例

输出

输出