在python中重新标记数组元素或使元素连续的快速方法

时间:2019-12-24 22:57:14

标签: arrays numpy image-processing scipy numba

我有一个庞大的3D数组要处理。我想以以下方式重新标记元素

import numpy as np
given_array = np.array([1, 1, 1, 3, 3, 5, 5, 5, 8, 8, 8, 8, 8, 23, 23, 23])
required_array = np.array([0, 0, 0, 1, 1,  2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4])

我知道relabel_sequential中有一种skimage.segmentation方法,但是出于我的目的,它很慢。任何想法的快速方式将不胜感激。

3 个答案:

答案 0 :(得分:2)

最快的方法应该是编写一个特定的numba函数,以适合您的需要。

示例

from numba import njit
import numpy as np

@njit()
def relabel(array):
    i = 0
    n = -1
    previous = 0
    while i < len(array):
        if previous != array[i]:
            previous  = array[i]
            n += 1
        array[i] = n
        i += 1

given_array = np.array([1, 1, 1, 3, 3, 5, 5, 5, 8, 8, 8, 8, 8, 23, 23, 23])
relabel(given_array)

given_array

输出

array([0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4])

此示例对输入进行了很多假设,即对数组进行排序,第一个数字为正,它是一维形状,要覆盖数组。

答案 1 :(得分:1)

尝试一下,看看它是否足够快。将inverse返回的numpy.unique与参数return_inverse=True一起使用:

In [52]: given_array = np.array([1, 1, 1, 3, 3, 5, 5, 5, 8, 8, 8, 8, 8, 23, 23, 23])             

In [53]: u, inv = np.unique(given_array, return_inverse=True)                                    

In [54]: inv                                                                                     
Out[54]: array([0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4])

答案 2 :(得分:1)

如果给定数组未排序,则比排序数组要快:

from numba import njit
import numpy as np

@njit()
def relabel_fast(array, count):
    i = 0
    while i < len(array):
        data = array[i]
        count[data] += 1
        i += 1
    a = 1 # Position in count
    b = 0 # Position in array
    c = 0 # The current output number
    while a < len(count):
        d = 0 # The number of 'c' to output
        if count[a] > 0:
            while d < count[a]:
                array[b] = c
                b += 1
                d += 1
            c += 1
        a += 1

def relabel(given_array):
    # Arrays cannot be created within Numba, so create the count array before calling the Numba function
    count = np.zeros(np.max(given_array) + 1, dtype=int)
    relabel_fast(given_array, count)


#given_array = np.array([1, 1, 1, 3, 3, 5, 5, 5, 8, 8, 8, 8, 8, 23, 23, 23])
given_array = np.array([1, 23, 1, 3, 8, 3, 5, 5, 8, 8, 8, 5, 8, 23, 23, 1])
relabel(given_array)

given_array

输出

array([0, 0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4])