在Python中随机选择数组中的连续元素

时间:2015-01-16 16:32:46

标签: python algorithm

我有一个索引列表,例如0 ... 365,我想选择少数,随机选择,无需替换,连续的子区域此列表。

index = [i+1 for i in range(365) ] 
#n could be 3
for i in range(n):
   exclusion_regions.append( get_random_contiguous_region(index) )

有没有人对 get_random_contiguous_region()

的实施提出建议?

4 个答案:

答案 0 :(得分:2)

你可以这样做:

import random

n = 3
index = [i+1 for i in range(10) ] 
slices = sorted(random.sample(range(0, len(index)), 2*n))
[index[start:end] for start, end in zip(slices[::2], slices[1::2])]

答案 1 :(得分:1)

我们需要一个while循环来确保我们不会重叠,您可以检查切片的长度是否符合任何其他条件,使用列表comp您无法指定不同的条件: 如果您希望随机切片占总列表大小的大约5%到15%,样本大小大约为30%:

from random import choice
from numpy import arange

index = [i + 1 for i in range(365)]
choices = []
seen = set()
ar = arange(0.05,.16, .01)
ln = len(index)
sample_size = 0
while sample_size < ln * .30:
    perc = choice(ar)  # get random 5, 10, 15 percent slices
    size = int(ln * perc)
    ch = choice(index[:-size+1]) # avoid falling off the side
    rn = index[ch:ch+size]
    if len(rn) == size and not seen.intersection(rn):
        seen.update(rn)
        choices.append(rn)
        sample_size += len(rn)
print(choices)

答案 2 :(得分:1)

这是一个以符号方式处理范围的解决方案,而不是考虑每个项目。

(对于你正在处理它的小基础可能是矫枉过正,但对于包含数万个项目的范围来说,效率会非常高。)


编辑:我已将其更新为允许将长度指定为整数或作为返回整数的0参数函数。您现在可以将长度作为分布给出,而不仅仅是常量。


import random

def range_intersection(a, b):
    if a.step == b.step == 1:
        return range(max(a.start, b.start), min(a.stop, b.stop), 1)
    else:
        # here be dragons!
        raise NotImplemented

def random_subrange(length, range_):
    start = random.randrange(
        range_.start,
        range_.stop - length * range_.step,
        range_.step
    )
    stop = start + length * range_.step
    return range(start, stop, range_.step)

def const_fn(n):
    def fn():
        return n
    return fn

def random_distinct_subranges(num, length, range_):
    if not callable(length):
        length = const_fn(length)
    ranges = []
    for n in range(num):
        while True:
            new_range = random_subrange(length(), range_)
            if not any(range_intersection(new_range, r) for r in ranges):
                ranges.append(new_range)
                break
    ranges.sort(key = lambda r: r.start)
    return ranges

然后

days = range(1, 366)

# pick 3 periods randomly without overlapping
periods = random_distinct_subranges(3, lambda:random.randint(5,15), days)
print(periods)

给出类似

的内容
[range(78, 92), range(147, 155), range(165, 173)]

可以像

一样迭代
from itertools import chain

rand_days = chain(*periods)
print(list(rand_days))

[78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 147, 148, 149, 150, 151, 152, 153, 154, 165, 166, 167, 168, 169, 170, 171, 172]

答案 3 :(得分:1)

这是一种安静简单的递归方法:索引列表随机分为给定大小范围内的连续序列。之后,选择其中三个子序列。

indexes = range(1, 80)
from random import randint, sample 

# recursive division of the sequence
def get_random_division(lst, minsize, maxsize):
    split_index = randint(minsize, maxsize)
    # if the remaining list would get too small, return the unsplit one
    if minsize>len(lst)-split_index:
        return [lst]
    return [lst[:split_index]] + get_random_division(lst[split_index:], minsize, maxsize)

# determine size range of the subdivisions
minsize, maxsize = 5, int(0.15*len(data))
# choose three of the subdivided sequences
sample(get_random_division(indexes, minsize, maxsize), 3)

输出:

[[17, 18, 19, 20, 21, 22, 23, 24, 25, 26],
 [36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46],
 [1, 2, 3, 4, 5]]