如何替换numpy数组中的值列表?

时间:2017-08-17 12:32:41

标签: python arrays performance numpy

我有一组未分类的数字。

我需要用特定的替代品替换某些数字(在列表中给出)(也在相应的列表中给出)

我编写了以下代码(似乎有效):

import numpy as np

numbers = np.arange(0,40)
np.random.shuffle(numbers)
problem_numbers = [33, 23, 15]  # table, night_stand, plant
alternative_numbers = [12, 14, 26]  # desk, dresser, flower_pot

for i in range(len(problem_numbers)):
    idx = numbers == problem_numbers[i]
    numbers[idx] = alternative_numbers[i]

然而,这似乎非常低效(对于更大的阵列,这需要做数百万次)。

我发现this问题回答了类似的问题但是在我的情况下,数字没有排序,他们需要保持原来的位置。

注意:numbers可能包含problem_numbers

中多次或不包含的元素

2 个答案:

答案 0 :(得分:2)

如果并非所有problem_values都在numbers,并且甚至可能多次出现:

在这种情况下,我只会使用dict来保留要替换的值,并使用dict.get来转换有问题的数字:

replacer = dict(zip(problem_numbers, alternative_numbers))
numbers_list = numbers.tolist()
numbers = np.array(list(map(replacer.get, numbers_list, numbers_list)))

即使它必须通过Python"这几乎是自我解释的,并且它比NumPy解决方案(可能)慢得多。

如果problem_value阵列中实际存在numbers而 ,则

如果您拥有numpy_indexed套餐,则可以使用numpy_indexed.indices

>>> import numpy_indexed as ni
>>> numbers[ni.indices(numbers, problem_numbers)] = alternative_numbers

即使对于大型阵列,这也应该非常有效。

答案 1 :(得分:2)

这是一种简单的方法:

import numpy as np

numbers = np.arange(0,40)
np.random.shuffle(numbers)
problem_numbers = [33, 23, 15]  # table, night_stand, plant
alternative_numbers = [12, 14, 26]  # desk, dresser, flower_pot

# Replace values
problem_numbers = np.asarray(problem_numbers)
alternative_numbers = np.asarray(alternative_numbers)
n_min, n_max = numbers.min(), numbers.max()
replacer = np.arange(n_min, n_max + 1)
mask = problem_numbers <= n_max  # Discard replacements out of range
replacer[problem_numbers[mask] - n_min] = alternative_numbers[mask]
numbers = replacer[numbers - n_min]

这很有效,只要numbers(最小和最大之间的差异)的值范围不大(例如,你没有像{{{{{{{ 1}},17)。

<强>基准

我已经将OP中的代码与使用此代码的三个(截至目前)建议的解决方案进行了比较:

10000000000

结果:

import numpy as np

def method_itzik(numbers, problem_numbers, alternative_numbers):
    numbers = np.asarray(numbers)
    for i in range(len(problem_numbers)):
        idx = numbers == problem_numbers[i]
        numbers[idx] = alternative_numbers[i]
    return numbers

def method_mseifert(numbers, problem_numbers, alternative_numbers):
    numbers = np.asarray(numbers)
    replacer = dict(zip(problem_numbers, alternative_numbers))
    numbers_list = numbers.tolist()
    numbers = np.array(list(map(replacer.get, numbers_list, numbers_list)))
    return numbers

def method_divakar(numbers, problem_numbers, alternative_numbers):
    numbers = np.asarray(numbers)
    problem_numbers = np.asarray(problem_numbers)
    problem_numbers = np.asarray(alternative_numbers)
    # Pre-process problem_numbers and correspondingly alternative_numbers
    # such that repeats and no matches are taken care of
    sidx_pn = problem_numbers.argsort()
    pn = problem_numbers[sidx_pn]
    mask = np.concatenate(([True],pn[1:] != pn[:-1]))
    an = alternative_numbers[sidx_pn]

    minN, maxN = numbers.min(), numbers.max()
    mask &= (pn >= minN) & (pn <= maxN)

    pn = pn[mask]
    an = an[mask]

    # Pre-pocessing done. Now, we need to use pn and an in place of
    # problem_numbers and alternative_numbers repectively. Map, index and assign.
    sidx = numbers.argsort()
    idx = sidx[np.searchsorted(numbers, pn, sorter=sidx)]
    valid_mask = numbers[idx] == pn
    numbers[idx[valid_mask]] = an[valid_mask]

def method_jdehesa(numbers, problem_numbers, alternative_numbers):
    numbers = np.asarray(numbers)
    problem_numbers = np.asarray(problem_numbers)
    alternative_numbers = np.asarray(alternative_numbers)
    n_min, n_max = numbers.min(), numbers.max()
    replacer = np.arange(n_min, n_max + 1)
    mask = problem_numbers <= n_max  # Discard replacements out of range
    replacer[problem_numbers[mask] - n_min] = alternative_numbers[mask]
    numbers = replacer[numbers - n_min]
    return numbers