如何创建没有重复的随机数列表?

时间:2012-03-18 02:35:16

标签: python random

我尝试使用random.randint(0, 100),但有些数字是相同的。是否有方法/模块来创建列表唯一的随机数?

def getScores():
    # open files to read and write
    f1 = open("page.txt", "r");
    p1 = open("pgRes.txt", "a");

    gScores = [];
    bScores = [];
    yScores = [];

    # run 50 tests of 40 random queries to implement "bootstrapping" method 
    for i in range(50):
        # get 40 random queries from the 50
        lines = random.sample(f1.readlines(), 40);

20 个答案:

答案 0 :(得分:118)

这将返回从0到99范围内选择的10个数字的列表,没有重复。

import random
random.sample(range(100), 10)

参考您的特定代码示例,您可能希望从文件中读取所有行,然后从内存中保存的列表中选择随机行。例如:

all_lines = f1.readlines()
for i in range(50):
    lines = random.sample(all_lines, 40)

这样,你只需要在循环之前实际读取一次文件。执行此操作要比返回文件的开头并为每次循环迭代再次调用f1.readlines()更有效。

答案 1 :(得分:9)

您可以先创建一个从ab的数字列表,其中ab分别是列表中最小和最大的数字,然后将其随机播放使用Fisher-Yates算法或使用Python的random.shuffle方法。

答案 2 :(得分:9)

您可以使用random模块中的 shuffle 功能,如下所示:

import random

my_list = list(xrange(1,100)) # list of integers from 1 to 99
                              # adjust this boundaries to fit your needs
random.shuffle(my_list)
print my_list # <- List of unique random numbers

请注意,shuffle方法不会像人们预期的那样返回任何列表,它只会随机引用传递的列表。

答案 3 :(得分:7)

this answer中提供的解决方案有效,但如果样本量很小,但是人口庞大(例如random.sample(insanelyLargeNumber, 10)),它可能会成为记忆问题。

要解决这个问题,我会选择:

answer = set()
sampleSize = 10
answerSize = 0

while answerSize < sampleSize:
    r = random.randint(0,100)
    if r not in answer:
        answerSize += 1
        answer.add(r)

# answer now contains 10 unique, random integers from 0.. 100

答案 4 :(得分:5)

因此,我意识到这篇文章已有6年历史了,但还有另一个答案(通常)具有更好的算法性能,尽管实用性较差,且开销较大。

其他答案包括随机播放方法和使用集的“尝试直到有效”方法。

如果我们从0 ... N-1区间中随机选择K个整数而不进行替换,那么shuffle方法将使用O(N)存储和O(N)运算,如果我们选择小K会很烦人set方法仅使用O(K)存储,但对于K接近N的情况,最坏情况是O(inf)期望的O(nf)O(n * log(n))。(想象一下,尝试从中随机获取最后一个数字已经选择了999998的两个允许答案,其中k = n-1 = 10 ^ 6)。

因此set方法适合K〜1,随机播放方法适合K〜N。两者都使用预期的> K RNG调用。

另一种方式;您可以

假装进行Fisher–Yates混洗,对于每个新的随机选择,对已经选择的元素执行二进制搜索操作以查找您将要的值如果您实际上存储的是尚未选择的所有元素的数组,则获取。 

如果您已经选择的值为[2,4],并且您的随机数生成器在时间间隔内(N-num_already_selected)吐出2,则您假装从[0,1,3,5,6]中选择,...]通过计算小于已选择答案的值。在这种情况下,您选择的第三个值将为3。然后,在下一步中,如果您的随机数再次为2 ,则它将映射为5(在 pretend list [0,1,5,6]),因为(已选择的值[2,3,4]的排序列表中的潜在索引5为3)+ 2 =5。

因此,将已经选择的值存储在平衡的二进制搜索树中,在每个节点上存储等级(小于该值的值的数量),从(0 ... n-(number已经选择))。然后像搜索一样将树下降,但搜索值为R加上您所在节点的等级。当您到达叶节点时,将随机数添加到该节点的等级,然后将总和插入平衡的二叉树中。

一旦您拥有K个元素,就将它们从树上读取到一个数组中并随机播放(如果顺序很重要)。

这需要O(K)的存储,O(K * log(K))的性能以及正好是K randint个调用。

随机采样(非随机最终排序,但之后可以进行O(K)洗牌),O(k)存储和O(k log ^ 2(k))性能(非O(k log(k)),因为我们无法为此实现自定义降低平衡的二叉树):

from sortedcontainers import SortedList


def sample(n, k):
    '''
    Return random k-length-subset of integers from 0 to n-1. Uses only O(k) 
    storage. Bounded k*log^2(k) worst case. K RNG calls. 
    '''
    ret = SortedList()
    for i in range(k):
        to_insert = random.randint(0, n-1 - len(ret))
        to_insert = binsearch_adding_rank(ret, to_insert)
        ret.add(to_insert)

    return ret

def binsearch_adding_rank(A, v):
    l, u = 0, len(A)-1
    m=0
    while l <= u:
        m = l+(u-l)//2
        if v + m >= A[m]:
            l = m+1
            m+=1 # We're binary searching for partitions, so if the last step was to the right then add one to account for offset because that's where our insert would be.
        elif v+m < A[m]:
            u = m-1
    return v+m

并显示有效性:

如果我们正在进行fisher-yates混洗,已经选择了[1,4,6,7,8,9,15,16],随机数为5,则尚未选择的数组将看起来像[0,2,3,5,10,11,12,...],因此元素5为11。因此,给定5和[1,4,6,7,8,我们的binsearch函数应返回11 9,15,16]:

assert binsearch_adding_rank([1,4,6,7,8,9,15,16], 5) == 11

[1,2,3]的倒数是[0,4,5,6,7,8,...],其第五个元素是8,所以:

assert binsearch_adding_rank([1,2,3], 5) == 8

[2,3,5]的倒数是[0,1,4,6,...],其第一个元素是(仍然)1,所以:

assert binsearch_adding_rank([2,3,5], 1) == 1

逆是[0,6,7,8,...],第三个元素是8,并且:

assert binsearch_adding_rank([1,2,3,4,5,10], 3) == 8

并测试整体功能:

# Edge cases: 
assert sample(50, 0) == []
assert sample(50, 50) == list(range(0,50))

# Variance should be small and equal among possible values:
x = [0]*10
for i in range(10_000):
    for v in sample(10, 5):
        x[v] += 1
for v in x:
    assert abs(5_000 - v) < 250, v
del x

# Check for duplication: 

y = sample(1500, 1000)
assert len(frozenset(y)) == len(y)
del y

但是,实际上,对于K〜> N / 2,请使用shuffle方法;对于K〜

edit:这是使用递归的另一种方式!我想是O(k * log(n))。

def divide_and_conquer_sample(n, k, l=0):
    u = n-1
    # Base cases:
    if k == 0:
        return []
    elif k == n-l:
        return list(range(l, n))
    elif k == 1:
        return [random.randint(l, u)]

    # Compute how many left and how many right:
    m = l + (u-l)//2
    k_right = 0
    k_left = 0
    for i in range(k):
        # Base probability: (# of available values in right interval) / (total available values)
        if random.random() <= (n-m - k_right)/(n-l-k_right-k_left):
            k_right += 1
        else:
            k_left += 1
    # Recur
    return divide_and_conquer_sample(n, k_right, m) + divide_and_conquer_sample(m, k_left, l)

答案 5 :(得分:3)

如果从1到N的N个数字列表是随机生成的,那么是的,有可能会重复某些数字。

如果你想要一个随机顺序从1到N的数字列表,请填充一个从1到N的整数的数组,然后使用Fisher-Yates shuffle或Python的random.shuffle()

答案 6 :(得分:3)

线性同余伪随机数生成器

  

O(1)内存

     

O(k)个操作

可以通过简单的Linear Congruential Generator解决此问题。这需要恒定的内存开销(8个整数)和最多2 *(序列长度)的计算。

所有其他解决方案使用更多的内存和更多的计算资源!如果只需要一些随机序列,则此方法将便宜得多。对于大小为N的范围,如果要生成N个唯一k序列或更大的序列,我建议使用内置方法random.sample(range(N),k)作为可接受的解决方案, python中的has been optimized来提高速度。

代码

# Return a randomized "range" using a Linear Congruential Generator
# to produce the number sequence. Parameters are the same as for 
# python builtin "range".
#   Memory  -- storage for 8 integers, regardless of parameters.
#   Compute -- at most 2*"maximum" steps required to generate sequence.
#
def random_range(start, stop=None, step=None):
    import random, math
    # Set a default values the same way "range" does.
    if (stop == None): start, stop = 0, start
    if (step == None): step = 1
    # Use a mapping to convert a standard range into the desired range.
    mapping = lambda i: (i*step) + start
    # Compute the number of numbers in this range.
    maximum = (stop - start) // step
    # Seed range with a random integer.
    value = random.randint(0,maximum)
    # 
    # Construct an offset, multiplier, and modulus for a linear
    # congruential generator. These generators are cyclic and
    # non-repeating when they maintain the properties:
    # 
    #   1) "modulus" and "offset" are relatively prime.
    #   2) ["multiplier" - 1] is divisible by all prime factors of "modulus".
    #   3) ["multiplier" - 1] is divisible by 4 if "modulus" is divisible by 4.
    # 
    offset = random.randint(0,maximum) * 2 + 1      # Pick a random odd-valued offset.
    multiplier = 4*(maximum//4) + 1                 # Pick a multiplier 1 greater than a multiple of 4.
    modulus = int(2**math.ceil(math.log2(maximum))) # Pick a modulus just big enough to generate all numbers (power of 2).
    # Track how many random numbers have been returned.
    found = 0
    while found < maximum:
        # If this is a valid value, yield it in generator fashion.
        if value < maximum:
            found += 1
            yield mapping(value)
        # Calculate the next value in the sequence.
        value = (value*multiplier + offset) % modulus

用法

此函数“ random_range”的用法与任何生成器(例如“ range”)相同。一个例子:

# Show off random range.
print()
for v in range(3,6):
    v = 2**v
    l = list(random_range(v))
    print("Need",v,"found",len(set(l)),"(min,max)",(min(l),max(l)))
    print("",l)
    print()

样本结果

Required 8 cycles to generate a sequence of 8 values.
Need 8 found 8 (min,max) (0, 7)
 [1, 0, 7, 6, 5, 4, 3, 2]

Required 16 cycles to generate a sequence of 9 values.
Need 9 found 9 (min,max) (0, 8)
 [3, 5, 8, 7, 2, 6, 0, 1, 4]

Required 16 cycles to generate a sequence of 16 values.
Need 16 found 16 (min,max) (0, 15)
 [5, 14, 11, 8, 3, 2, 13, 1, 0, 6, 9, 4, 7, 12, 10, 15]

Required 32 cycles to generate a sequence of 17 values.
Need 17 found 17 (min,max) (0, 16)
 [12, 6, 16, 15, 10, 3, 14, 5, 11, 13, 0, 1, 4, 8, 7, 2, ...]

Required 32 cycles to generate a sequence of 32 values.
Need 32 found 32 (min,max) (0, 31)
 [19, 15, 1, 6, 10, 7, 0, 28, 23, 24, 31, 17, 22, 20, 9, ...]

Required 64 cycles to generate a sequence of 33 values.
Need 33 found 33 (min,max) (0, 32)
 [11, 13, 0, 8, 2, 9, 27, 6, 29, 16, 15, 10, 3, 14, 5, 24, ...]

答案 7 :(得分:3)

如果您需要对非常大的数字进行采样,则无法使用range

random.sample(range(10000000000000000000000000000000), 10)

因为它抛出:

OverflowError: Python int too large to convert to C ssize_t

此外,如果由于范围太小,random.sample无法产生您想要的商品数量

 random.sample(range(2), 1000)
它扔了:

 ValueError: Sample larger than population

此功能解决了这两个问题:

import random

def random_sample(count, start, stop, step=1):
    def gen_random():
        while True:
            yield random.randrange(start, stop, step)

    def gen_n_unique(source, n):
        seen = set()
        seenadd = seen.add
        for i in (i for i in source() if i not in seen and not seenadd(i)):
            yield i
            if len(seen) == n:
                break

    return [i for i in gen_n_unique(gen_random,
                                    min(count, int(abs(stop - start) / abs(step))))]

使用非常大的数字:

print('\n'.join(map(str, random_sample(10, 2, 10000000000000000000000000000000))))

示例结果:

7822019936001013053229712669368
6289033704329783896566642145909
2473484300603494430244265004275
5842266362922067540967510912174
6775107889200427514968714189847
9674137095837778645652621150351
9969632214348349234653730196586
1397846105816635294077965449171
3911263633583030536971422042360
9864578596169364050929858013943

范围小于请求项目数的用法:

print(', '.join(map(str, random_sample(100000, 0, 3))))

示例结果:

2, 0, 1

它也适用于负范围和步骤:

print(', '.join(map(str, random_sample(10, 10, -10, -2))))
print(', '.join(map(str, random_sample(10, 5, -5, -2))))

示例结果:

2, -8, 6, -2, -4, 0, 4, 10, -6, 8
-3, 1, 5, -1, 3

答案 8 :(得分:1)

为了获得一个确定的,有效的并使用基本编程构造构建的,不包含重复项的随机值列表的程序,请考虑下面定义的函数extractSamples

def extractSamples(populationSize, sampleSize, intervalLst) :
    import random
    if (sampleSize > populationSize) :
        raise ValueError("sampleSize = "+str(sampleSize) +" > populationSize (= " + str(populationSize) + ")")
    samples = []
    while (len(samples) < sampleSize) :
        i = random.randint(0, (len(intervalLst)-1))
        (a,b) = intervalLst[i]
        sample = random.randint(a,b)
        if (a==b) :
            intervalLst.pop(i)
        elif (a == sample) : # shorten beginning of interval                                                                                                                                           
            intervalLst[i] = (sample+1, b)
        elif ( sample == b) : # shorten interval end                                                                                                                                                   
            intervalLst[i] = (a, sample - 1)
        else :
            intervalLst[i] = (a, sample - 1)
            intervalLst.append((sample+1, b))
        samples.append(sample)
    return samples

基本思想是跟踪间隔intervalLst的可能值,以从中选择所需的元素。从确定的意义上说,这是确定性的,我们可以保证在固定的步骤数内生成样本(仅取决于populationSizesampleSize)。

要使用上述功能生成我们所需的列表,

In [3]: populationSize, sampleSize = 10**17, 10**5

In [4]: %time lst1 = extractSamples(populationSize, sampleSize, [(0, populationSize-1)])
CPU times: user 289 ms, sys: 9.96 ms, total: 299 ms
Wall time: 293 ms

我们也可以将其与较早的解决方案进行比较(以更低的PopulationSize值)

In [5]: populationSize, sampleSize = 10**8, 10**5

In [6]: %time lst = random.sample(range(populationSize), sampleSize)
CPU times: user 1.89 s, sys: 299 ms, total: 2.19 s
Wall time: 2.18 s

In [7]: %time lst1 = extractSamples(populationSize, sampleSize, [(0, populationSize-1)])
CPU times: user 449 ms, sys: 8.92 ms, total: 458 ms
Wall time: 442 ms

请注意,我降低了populationSize的值,因为使用random.sample解决方案时,它会产生较高值的Memory Error(在先前的答案herehere中也提到过)。对于上述值,我们还可以观察到extractSamples优于random.sample方法。

P.S。 :尽管核心方法与我的earlier answer类似,但是在实现和方法上都进行了重大修改,同时清晰度也得到了改善。

答案 9 :(得分:0)

您可以使用 Numpy 库快速回答,如下所示 -

鉴于代码段列出了介于0到5之间的6个唯一数字。您可以调整参数以获得舒适感。

import numpy as np
import random
a = np.linspace( 0, 5, 6 )
random.shuffle(a)
print(a)

输出

[ 2.  1.  5.  3.  4.  0.]

它没有像我们在random.sample中看到的任何约束那样引用here

希望这有点帮助。

答案 10 :(得分:0)

如果您希望确保添加的数字是唯一的,可以使用Set object

如果使用2.7或更高版本,或者如果没有则导入sets模块。

正如其他人所说,这意味着数字并非真正随机。

答案 11 :(得分:0)

here提供的答案在时间方面效果很好 以及内存,但由于使用高级python而更加复杂 产量等结构。 simpler answer在实践中效果很好,但是与此相关的问题 答案是它可能在实际构造之前生成许多虚假整数 所需的设置。使用人口大小= 1000,样本大小= 999进行尝试。 从理论上讲,它有可能不会终止。

下面的答案解决了这两个问题,因为它是确定性的并且有点有效 虽然目前效率不如其他两个。

def randomSample(populationSize, sampleSize):
  populationStr = str(populationSize)
  dTree, samples = {}, []
  for i in range(sampleSize):
    val, dTree = getElem(populationStr, dTree, '')
    samples.append(int(val))
  return samples, dTree

函数getElem,percolateUp的定义如下

import random

def getElem(populationStr, dTree, key):
  msd  = int(populationStr[0])
  if not key in dTree.keys():
    dTree[key] = range(msd + 1)
  idx = random.randint(0, len(dTree[key]) - 1)
  key = key +  str(dTree[key][idx])
  if len(populationStr) == 1:
    dTree[key[:-1]].pop(idx)
    return key, (percolateUp(dTree, key[:-1]))
  newPopulation = populationStr[1:]
  if int(key[-1]) != msd:
    newPopulation = str(10**(len(newPopulation)) - 1)
  return getElem(newPopulation, dTree, key)

def percolateUp(dTree, key):
  while (dTree[key] == []):
    dTree[key[:-1]].remove( int(key[-1]) )
    key = key[:-1]
  return dTree

最后,如下所示,对于较大的n值,平均时间约为15ms,

In [3]: n = 10000000000000000000000000000000

In [4]: %time l,t = randomSample(n, 5)
Wall time: 15 ms

In [5]: l
Out[5]:
[10000000000000000000000000000000L,
 5731058186417515132221063394952L,
 85813091721736310254927217189L,
 6349042316505875821781301073204L,
 2356846126709988590164624736328L]

答案 12 :(得分:0)

一个非常简单的功能,也可以解决您的问题

from random import randint

data = []

def unique_rand(inicial, limit, total):

        data = []

        i = 0

        while i < total:
            number = randint(inicial, limit)
            if number not in data:
                data.append(number)
                i += 1

        return data


data = unique_rand(1, 60, 6)

print(data)


"""

prints something like 

[34, 45, 2, 36, 25, 32]

"""

答案 13 :(得分:0)

基于集合的方法(“如果返回值中有随机值,请重试”)的问题是,由于冲突(需要另一次“重试”迭代),不确定它们的运行时间,尤其是在大量随机情况下值从该范围返回。

以下是这种不确定的运行时的替代方案:

import bisect
import random

def fast_sample(low, high, num):
    """ Samples :param num: integer numbers in range of
        [:param low:, :param high:) without replacement
        by maintaining a list of ranges of values that
        are permitted.

        This list of ranges is used to map a random number
        of a contiguous a range (`r_n`) to a permissible
        number `r` (from `ranges`).
    """
    ranges = [high]
    high_ = high - 1
    while len(ranges) - 1 < num:
        # generate a random number from an ever decreasing
        # contiguous range (which we'll map to the true
        # random number).
        # consider an example with low=0, high=10,
        # part way through this loop with:
        #
        # ranges = [0, 2, 3, 7, 9, 10]
        #
        # r_n :-> r
        #   0 :-> 1
        #   1 :-> 4
        #   2 :-> 5
        #   3 :-> 6
        #   4 :-> 8
        r_n = random.randint(low, high_)
        range_index = bisect.bisect_left(ranges, r_n)
        r = r_n + range_index
        for i in xrange(range_index, len(ranges)):
            if ranges[i] <= r:
                # as many "gaps" we iterate over, as much
                # is the true random value (`r`) shifted.
                r = r_n + i + 1
            elif ranges[i] > r_n:
                break
        # mark `r` as another "gap" of the original
        # [low, high) range.
        ranges.insert(i, r)
        # Fewer values possible.
        high_ -= 1
    # `ranges` happens to contain the result.
    return ranges[:-1]

答案 14 :(得分:0)

采样整数而无需在minvalmaxval之间进行替换:

import numpy as np

minval, maxval, n_samples = -50, 50, 10
generator = np.random.default_rng(seed=0)
samples = generator.permutation(np.arange(minval, maxval))[:n_samples]

# or, if minval is 0,
samples = generator.permutation(maxval)[:n_samples]

使用jax:

import jax

minval, maxval, n_samples = -50, 50, 10
key = jax.random.PRNGKey(seed=0)
samples = jax.random.shuffle(key, jax.numpy.arange(minval, maxval))[:n_samples]

答案 15 :(得分:0)

import random

sourcelist=[]
resultlist=[]

for x in range(100):
    sourcelist.append(x)

for y in sourcelist:
    resultlist.insert(random.randint(0,len(resultlist)),y)

print (resultlist)

答案 16 :(得分:0)

这是我做的一个很小的函数,希望能帮到你!

import random
numbers = list(range(0, 100))
random.shuffle(numbers)

答案 17 :(得分:0)

如果您想要的数字数量是随机的,您可以这样做。在这种情况下,长度是您要从中选择的最大数字。

如果它注意到已经选择了新的随机数,它会从计数中减去 1(因为在知道它是否重复之前添加了一个计数)。如果它不在列表中,那么用它做你想做的事情并将它添加到列表中,这样它就不会被再次选中。

import random
def randomizer(): 
            chosen_number=[]
            count=0
            user_input = int(input("Enter number for how many rows to randomly select: "))
            numlist=[]
            #length = whatever the highest number you want to choose from
            while 1<=user_input<=length:
                count=count+1
                if count>user_input:
                    break
                else:
                    chosen_number = random.randint(0, length)
                    if line_number in numlist:
                        count=count-1
                        continue
                    if chosen_number not in numlist:
                        numlist.append(chosen_number)
                        #do what you want here

答案 18 :(得分:-1)

import random
result=[]
for i in range(1,50):
    rng=random.randint(1,20)
    result.append(rng)

答案 19 :(得分:-2)

来自win xp中的CLI:

python -c "import random; print(sorted(set([random.randint(6,49) for i in range(7)]))[:6])"

在加拿大,我们有6/49乐透。我只需将上面的代码包装在lotto.bat中并运行C:\home\lotto.batC:\home\lotto

由于random.randint经常会重复一个数字,我会将setrange(7)一起使用,然后将其缩短为6。

有时,如果一个数字重复超过2次,则结果列表长度将小于6。

编辑:但是,random.sample(range(6,49),6)是正确的方法。