Question

我想从矢量生成，为简单起见，我们可以调用＆＃34; serie1＆＃34;另一个维度为1000x1的向量，其中这个新向量的每个元素是向量的第j个随机元素的总和＆＃34; serie1＆＃34;。

我正在考虑从维度1000xj的向量创建一个随机矩阵，并将它们水平相加。

你会如何建议用Python做什么？

为了获得随机向量，我可以做

Vector=np.random.choice(serie1, 1000, replace=True)

但我不知道如何继续，如果有一个有效的解决方案。

Answer 1

你很近：

vector = np.random.choice(serie1, (1000, j), replace=True).sum(axis=-1, keepdims=True)

请注意，这是替换。

对于不太大的j，可以应用接受拒绝方案来消除重复。

def accept_reject(serie1, j):
    efficiency_ratio = 2 # just a guess
    M = len(serie1)
    accept_rate = np.prod(np.linspace(1-(j-1)/M, 1, j))
    n_draw = int(1000 / accpet_rate + 4 * np.sqrt(1000*(1 - accept_rate)))
    if n_draw * j * efficiency_ratio > 1000 * M:
        return use_other_solution(serie1, j)
    raw = np.random.randint(0, M, (n_draw, j))
    raw.sort(axis=-1)
    raw = raw[np.all(np.diff(raw, axis=-1) > 0, axis=-1), :]
    if len(raw)>1000:
        raw = raw[:1000, :]
    elif len(raw)<1000:
        return use_other_solution(serie1, j)
    return serie1[raw].sum(axis=-1, keepdims=True)

Answer 2

基础Python

var provider = new firebase.auth.GoogleAuthProvider();
$scope.authObj.$signInWithPopup(provider).then(function(result) {
    console.log(result);
});

使用Numpy启用替换

from random import sample
vector = [sum(sample(serie1, j)) for _ in range(1000)]

Answer 3

基本问题是为j行获取1000个唯一元素。我们无法直接使用np.random.choice(.....replace=True)，因为我们不会拥有j个唯一元素。为了解决我们的情况，一种矢量化方法是使用形状(1000,len(input_array))的随机矩阵，沿第二轴执行argsort并获得每行j个唯一索引，然后索引到输入带有它的数组，最后沿第二轴求和。

为了实现它，我们将有两种方法 -

def app1(serie1, j, N=1000):
    idx = np.random.rand(N,serie1.size).argsort(1)[:,:j]
    return serie1[idx].sum(1)

使用高效np.argpartition选择随机j元素，然后使用np.take进行有效索引 -

def app2(serie1, j, N=1000):
    idx = np.random.rand(N,serie1.size).argpartition(j,axis=1)[:,:j]
    return np.take(serie1, idx).sum(1)

示例运行到演示创建索引idx -

In [35]: serie1 = np.random.randint(0,9,(20))

In [36]: idx = np.random.rand(1000,serie1.size).argsort(1)[:,:5]

In [37]: idx
Out[37]: 
array([[16, 13, 19,  0, 15],
       [ 7,  4, 13, 15, 14],
       [ 8,  3, 15,  1,  9],
       ..., 
       [11, 15, 17,  4, 19],
       [19,  0,  3,  7,  9],
       [10,  1, 19, 12,  6]])

验证统一随机抽样 -

In [81]: serie1 = np.arange(20)

In [82]: j = 5

In [83]: idx = np.random.rand(1000000,serie1.size).argsort(1)[:,:j]

In [84]: np.bincount(idx.ravel())
Out[84]: 
array([250317, 250298, 250645, 249544, 250396, 249972, 249492, 250512,
       249968, 250133, 249622, 250170, 250291, 250060, 250102, 249446,
       249398, 249003, 250249, 250382])

在输入数组中20元素的长度上具有相当相等的计数，我认为它的分布非常均匀。

运行时测试 -

In [140]: serie1 = np.random.randint(0,9,(20))

In [141]: j = 5

# @elcombato's soln
In [142]: %timeit [sum(sample(serie1, j)) for _ in range(1000)]
100 loops, best of 3: 10.7 ms per loop

# Posted solutions in this post
In [143]: %timeit app1(serie1, j, N=1000)
     ...: %timeit app2(serie1, j, N=1000)
     ...: 
1000 loops, best of 3: 943 µs per loop
1000 loops, best of 3: 870 µs per loop

Python：数组中的随机矩阵

3 个答案: