
时间:2012-10-09 10:45:23

标签: python random

我有一个包含1500个元素的列表a_tot,我想以随机方式将此列表分成两个列表。列表a_1将有1300,列表a_2将有200个元素。我的问题是用1500个元素随机化原始列表的最佳方法。当我随机化列表时,我可以使用1300个切片和200个切片。 一种方法是使用random.shuffle,另一种方法是使用random.sample。两种方法之间随机化质量的差异是什么?列表1中的数据应该是随机样本以及list2中的数据。 有什么建议? 使用shuffle:

random.shuffle(a_tot)    #get a randomized list
a_1 = a_tot[0:1300]     #pick the first 1300
a_2 = a_tot[1300:]      #pick the last 200


new_t = random.sample(a_tot,len(a_tot))    #get a randomized list
a_1 = new_t[0:1300]     #pick the first 1300
a_2 = new_t[1300:]      #pick the last 200

6 个答案:

答案 0 :(得分:3)


def shuffle(self, x, random=None, int=int):
    """x, random=random.random -> shuffle list x in place; return None.

    Optional arg random is a 0-argument function returning a random
    float in [0.0, 1.0); by default, the standard random.random.

    if random is None:
        random = self.random
    for i in reversed(xrange(1, len(x))):
        # pick an element in x[:i+1] with which to exchange x[i]
        j = int(random() * (i+1))
        x[i], x[j] = x[j], x[i]


def sample(self, population, k):
    """Chooses k unique random elements from a population sequence.

    Returns a new list containing elements from the population while
    leaving the original population unchanged.  The resulting list is
    in selection order so that all sub-slices will also be valid random
    samples.  This allows raffle winners (the sample) to be partitioned
    into grand prize and second place winners (the subslices).

    Members of the population need not be hashable or unique.  If the
    population contains repeats, then each occurrence is a possible
    selection in the sample.

    To choose a sample in a range of integers, use xrange as an argument.
    This is especially fast and space efficient for sampling from a
    large population:   sample(xrange(10000000), 60)

    # XXX Although the documentation says `population` is "a sequence",
    # XXX attempts are made to cater to any iterable with a __len__
    # XXX method.  This has had mixed success.  Examples from both
    # XXX sides:  sets work fine, and should become officially supported;
    # XXX dicts are much harder, and have failed in various subtle
    # XXX ways across attempts.  Support for mapping types should probably
    # XXX be dropped (and users should pass mapping.keys() or .values()
    # XXX explicitly).

    # Sampling without replacement entails tracking either potential
    # selections (the pool) in a list or previous selections in a set.

    # When the number of selections is small compared to the
    # population, then tracking selections is efficient, requiring
    # only a small set and an occasional reselection.  For
    # a larger number of selections, the pool tracking method is
    # preferred since the list takes less space than the
    # set and it doesn't suffer from frequent reselections.

    n = len(population)
    if not 0 <= k <= n:
        raise ValueError, "sample larger than population"
    random = self.random
    _int = int
    result = [None] * k
    setsize = 21        # size of a small set minus size of an empty list
    if k > 5:
        setsize += 4 ** _ceil(_log(k * 3, 4)) # table size for big sets
    if n <= setsize or hasattr(population, "keys"):
        # An n-length list is smaller than a k-length set, or this is a
        # mapping type so the other algorithm wouldn't work.
        pool = list(population)
        for i in xrange(k):         # invariant:  non-selected at [0,n-i)
            j = _int(random() * (n-i))
            result[i] = pool[j]
            pool[j] = pool[n-i-1]   # move non-selected item into vacancy
            selected = set()
            selected_add = selected.add
            for i in xrange(k):
                j = _int(random() * n)
                while j in selected:
                    j = _int(random() * n)
                result[i] = population[j]
        except (TypeError, KeyError):   # handle (at least) sets
            if isinstance(population, list):
            return self.sample(tuple(population), k)
    return result

如您所见,在这两种情况下,随机化基本上由行int(random() * n)完成。因此,基础算法基本相同。

答案 1 :(得分:1)



答案 2 :(得分:1)

shuffle() sample()之间存在两个主要差异:



通过演示根据 sample()实现 shuffle(),显示两者之间的概念关系很有意思:

def shuffle(p):
   p[:] = sample(p, len(p))

反之亦然,根据 shuffle()实施 sample()

def sample(p, k):
   p = list(p)
   return p[:k]


答案 3 :(得分:0)


答案 4 :(得分:0)


答案 5 :(得分:0)

from random import shuffle
from random import sample 
x = [[i] for i in range(10)]

shuffle更新同一列表中的输出但样本返回更新   list sample提供pic设施中的参数no,但是shuffle   提供相同长度的输入列表