Question

我正在尝试根据输入大小扩展一段代码，瓶颈似乎是numpy.where的调用，我只使用第一个真正的索引：

indexs = [numpy.where(_<cump)[0][0] for _ in numpy.random.rand(sample_size)]

如果我在遇到第一个numpy值后告诉true停止（我正在反转累积密度函数 - cump - 在cump的第一个数组值上快速增长），我会快得多。我可以手动完成循环和休息，但我想知道是否有pythonista方式这样做？

Answer 1

如果cump是累积密度函数，那么它是单调的，因此排序。不是线性扫描它，而是通过二进制搜索获得最佳性能保证。

首先我们创建一些假数据来搜索：

>>> import numpy as np
>>> cump = np.cumsum(np.random.rand(11))
>>> cump -= cump[0]
>>> cump /= cump[-1]
>>> cump
array([ 0.        ,  0.07570573,  0.1417473 ,  0.30536346,  0.36277835,
        0.47102093,  0.54456142,  0.6859625 ,  0.75270741,  0.84691162,  1.
   ])

然后我们创建一些假数据来搜索：

>>> sample = np.random.rand(5)
>>> sample
array([ 0.19597276,  0.37885803,  0.2096784 ,  0.57559965,  0.72175056])

我们终于搜索它（并找到它！）：

>>> [np.where(_ < cump)[0][0] for _ in sample]
[3, 5, 3, 7, 8]
>>> np.searchsorted(cump, sample)
array([3, 5, 3, 7, 8], dtype=int64)

第一次遇到

1 个答案: