Question

我正在使用Python并且正在使用numpy。我想生成一对随机数。我想要排除重复的结果，两个条目的数字相同，我想要包含只有一个条目是相同数字的对。我试图使用

import numpy
numpy.random.choice(a,(m,n),replace=False)

对于它，但它完全排除了任何具有相同条目的tupels，即

import numpy
numpy.random.choice(a=2,(m=2,n=1),replace=False)

只给我（1,0）和（0,1）而不是（1,1），（0,0），（1,0）和（0,1）。

我想这样做是因为我想绘制一个随机元组的样本，其中包含一个大的a和大的n（如上所述），而不是一次完全相同的tupels。它也应该或多或少有效。有没有一种方法已经实现了呢？

Answer 1

生成器随机唯一坐标：

from random import randint

def gencoordinates(m, n):
    seen = set()

    x, y = randint(m, n), randint(m, n)

    while True:
        seen.add((x, y))
        yield (x, y)
        x, y = randint(m, n), randint(m, n)
        while (x, y) in seen:
            x, y = randint(m, n), randint(m, n)

输出：

>>> g = gencoordinates(1, 100)
>>> next(g)
(42, 98)
>>> next(g)
(9, 5)
>>> next(g)
(89, 29)
>>> next(g)
(67, 56)
>>> next(g)
(63, 65)
>>> next(g)
(92, 66)
>>> next(g)
(11, 46)
>>> next(g)
(68, 21)
>>> next(g)
(85, 6)
>>> next(g)
(95, 97)
>>> next(g)
(20, 6)
>>> next(g)
(20, 86)

正如您所看到的，重复x坐标！

Answer 2

假设你的 x 和 y 坐标都是0和 n 之间的整数。对于小 n ，一个简单的方法可能是使用.grow:hover { transform: scale(1.1); }生成所有可能的 xy 坐标的集合，将其重新整形为np.mgrid数组，然后从中抽取随机行：

(nx * ny, 2)

如果 nx 和/或 ny 非常大，那么创建所有可能坐标的数组会变得很昂贵，在这种情况下，使用生成器对象可能会更好跟踪以前使用的坐标，如詹姆斯的答案。

根据 @morningsun 的建议，另一种方法是从 nx * ny 索引集合中采样到展平数组中，然后将这些指数直接转换为 x，y 坐标，避免构造可能 x，y 排列的整个 nx * ny 数组。

为了进行比较，这里是我对N维数组进行推广的原始方法的一个版本，以及使用新方法的版本：

nx, ny = 100, 200
xy = np.mgrid[:nx,:ny].reshape(2, -1).T
sample = xy.take(np.random.choice(xy.shape[0], 100, replace=False), axis=0)

在实践中没有太大的区别，但第二种方法的好处对于更大的阵列来说变得更加明显：

def sample_comb1(dims, nsamp):
    perm = np.indices(dims).reshape(len(dims), -1).T
    idx = np.random.choice(perm.shape[0], nsamp, replace=False)
    return perm.take(idx, axis=0)

def sample_comb2(dims, nsamp):
    idx = np.random.choice(np.prod(dims), nsamp, replace=False)
    return np.vstack(np.unravel_index(idx, dims)).T

如果你安装了scikit-learn，sklearn.utils.random.sample_without_replacement提供了一种更快的方法来生成随机索引而无需使用Floyd's algorithm进行替换：

In [1]: %timeit sample_comb1((100, 200), 100)
100 loops, best of 3: 2.59 ms per loop

In [2]: %timeit sample_comb2((100, 200), 100)
100 loops, best of 3: 2.4 ms per loop

In [3]: %timeit sample_comb1((1000, 2000), 100)
1 loops, best of 3: 341 ms per loop

In [4]: %timeit sample_comb2((1000, 2000), 100)
1 loops, best of 3: 319 ms per loop

Answer 3

@James Miles答案很棒，但是为了避免无休止的循环，当我意外地要求太多的参数时，我建议如下（它也会删除一些重复）：

def gencoordinates(m, n):
    seen = set()
    x, y = randint(m, n), randint(m, n)
    while len(seen) < (n + 1 - m)**2:
        while (x, y) in seen:
            x, y = randint(m, n), randint(m, n)
        seen.add((x, y))
        yield (x, y)
    return

请注意，错误的值范围仍将向下传播。

如何在Python中生成随机数字对，包括一个条目相同的对，并排除两个条目相同的对？

3 个答案: