Numpy:将数值插入数组的最快方法,使数组按顺序排列

时间:2018-02-13 22:42:23

标签: python sorting numpy concatenation

假设我有一个数组internal class ChallengeResult : HttpUnauthorizedResult { // TODO: Specify an XsrfKey? private const string XsrfKey = "SomethingHere"; public ChallengeResult(string provider, string redirectUri) : this(provider, redirectUri, null) { } public ChallengeResult(string provider, string redirectUri, string userId) { this.LoginProvider = provider; this.RedirectUri = redirectUri; this.UserId = userId; } public string LoginProvider { get; set; } public string RedirectUri { get; set; } public string UserId { get; set; } public override void ExecuteResult(ControllerContext context) { var properties = new AuthenticationProperties { RedirectUri = this.RedirectUri }; if (this.UserId != null) { properties.Dictionary[XsrfKey] = this.UserId; } context.HttpContext.GetOwinContext().Authentication.Challenge(properties, this.LoginProvider); } } 和一个奇异值my_array。 (请注意my_val始终已排序。)

my_array

因为my_array = np.array([1, 2, 3, 4, 5]) my_val = 1.5 是1.5,我想把它放在1到2之间,给我数组my_val

我的问题是:生成有序输出数组的最快方式(即以微秒为单位)是[1, 1.5, 2, 3, 4, 5]增长到什么时候?

我原来的方法是将值连接到原始数组然后排序:

my_array

我知道arr_out = np.sort(np.concatenate((my_array, np.array([my_val])))) [ 1. 1.5 2. 3. 4. 5. ] 速度很快,但我不确定np.concatenatenp.sort增长时会如何扩展,即使my_array将始终排序。

编辑:

我已经汇总了接受答案时列出的各种方法的时间:

输入:

my_array

输出:

import timeit

timeit_setup = 'import numpy as np\n' \
               'my_array = np.array([i for i in range(1000)], dtype=np.float64)\n' \
               'my_val = 1.5'
num_trials = 1000

my_time = timeit.timeit(
    'np.sort(np.concatenate((my_array, np.array([my_val]))))',
    setup=timeit_setup, number=num_trials
)

pauls_time = timeit.timeit(
    'idx = my_array.searchsorted(my_val)\n'
    'np.concatenate((my_array[:idx], [my_val], my_array[idx:]))',
    setup=timeit_setup, number=num_trials
)

sanchit_time = timeit.timeit(
    'np.insert(my_array, my_array.searchsorted(my_val), my_val)',
    setup=timeit_setup, number=num_trials
)

print('Times for 1000 repetitions for array of length 1000:')
print("My method took {}s".format(my_time))
print("Paul Panzer's method took {}s".format(pauls_time))
print("Sanchit Anand's method took {}s".format(sanchit_time))

对于长度为1,000,000的数组,重复100次:

Times for 1000 repetitions for array of length 1000:
My method took 0.017865657746239747s
Paul Panzer's method took 0.005813951002013821s
Sanchit Anand's method took 0.014003945532323987s

2 个答案:

答案 0 :(得分:3)

使用np.searchsorted以对数时间查找插入点:

>>> idx = my_array.searchsorted(my_val)
>>> np.concatenate((my_array[:idx], [my_val], my_array[idx:]))
array([1. , 1.5, 2. , 3. , 4. , 5. ])

注1:我建议查看@Willem Van Onselm和@ hpaulj的深刻见解。

注意2:如果所有数据类型从头开始匹配,则使用@Sanchit Anand建议的np.insert可能会稍微方便一些。然而,值得一提的是,这种便利是以巨大开销为代价的:

>>> def f_pp(my_array, my_val):
...      idx = my_array.searchsorted(my_val)
...      return np.concatenate((my_array[:idx], [my_val], my_array[idx:]))
... 
>>> def f_sa(my_array, my_val):
...      return np.insert(my_array, my_array.searchsorted(my_val), my_val)
...
>>> my_farray = my_array.astype(float)
>>> from timeit import repeat
>>> kwds = dict(globals=globals(), number=100000)
>>> repeat('f_sa(my_farray, my_val)', **kwds)
[1.2453778409981169, 1.2268288589984877, 1.2298014000116382]
>>> repeat('f_pp(my_array, my_val)', **kwds)
[0.2728819379990455, 0.2697303680033656, 0.2688361559994519]

答案 1 :(得分:3)

my_array = np.insert(my_array,my_array.searchsorted(my_val),my_val)

[EDIT]确保数组的类型为float32或float64,或者在初始化时为任何列表元素添加小数点。