使用python随机从列表中提取x项

时间:2014-05-04 17:26:31

标签: python list random indices

从两个列表开始,例如:

lstOne = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
lstTwo = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']

我想让用户输入他们想要提取的项目数,占总列表长度的百分比,以及每个列表中随机提取的相同索引。例如,说我希望50%的输出是

newLstOne = ['8', '1', '3', '7', '5']
newLstTwo = ['8', '1', '3', '7', '5']

我使用以下代码实现了这一目标:

from random import randrange

lstOne = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
lstTwo = [ '1', '2', '3', '4', '5', '6', '7', '8', '9', '10']

LengthOfList = len(lstOne)
print LengthOfList

PercentageToUse = input("What Percentage Of Reads Do you want to extract? ")
RangeOfListIndices = []

HowManyIndicesToMake = (float(PercentageToUse)/100)*float(LengthOfList)
print HowManyIndicesToMake

for x in lstOne:
    if len(RangeOfListIndices)==int(HowManyIndicesToMake):
        break
    else:
        random_index = randrange(0,LengthOfList)
        RangeOfListIndices.append(random_index)

print RangeOfListIndices


newlstOne = []
newlstTwo = []

for x in RangeOfListIndices:
    newlstOne.append(lstOne[int(x)])
for x in RangeOfListIndices:
    newlstTwo.append(lstTwo[int(x)])

print newlstOne
print newlstTwo

但是我想知道是否有更有效的方法来实现这一点,在我的实际使用案例中,这是从145,000个项目中进行二次采样。此外,randrange是否足够没有这种规模的偏见?

谢谢

3 个答案:

答案 0 :(得分:7)

问。 I want to have the user input how many items they want to extract, as a percentage of the overall list length, and the same indices from each list to be randomly extracted.

A。最直接的方法直接符合您的规范:

 percentage = float(raw_input('What percentage? '))
 k = len(data) * percentage // 100
 indicies = random.sample(xrange(len(data)), k)
 new_list1 = [list1[i] for i in indicies]
 new_list2 = [list2[i] for i in indicies]

问。 in my actual use case this is subsampling from 145,000 items. Furthermore, is randrange sufficiently free of bias at this scale?

A. 在Python 2和Python 3中, random.randrange()函数完全消除了偏见(它使用内部 _randbelow()在找到无偏差结果之前进行多次随机选择的方法。

在Python 2中, random.sample()函数略有偏差,但仅在最后53位的舍入中。在Python 3中, random.sample()函数使用内部 _randbelow()方法,并且没有偏差。

答案 1 :(得分:1)

只需将zip两个列表放在一起,使用random.sample进行抽样,然后再次zip转置回两个列表。

import random

_zips = random.sample(zip(lstOne,lstTwo), 5)

new_list_1, new_list_2 = zip(*_zips)

演示:

list_1 = range(1,11)
list_2 = list('abcdefghij')

_zips = random.sample(zip(list_1, list_2), 5)

new_list_1, new_list_2 = zip(*_zips)

new_list_1
Out[33]: (3, 1, 9, 8, 10)

new_list_2
Out[34]: ('c', 'a', 'i', 'h', 'j')

答案 2 :(得分:1)

你这样做的方式看起来对我很好。

如果您想避免多次对同一对象进行采样,可以按以下步骤操作:

a = len(lstOne)
choose_from = range(a)          #<--- creates a list of ints of size len(lstOne)
random.shuffle(choose_from)
for i in choose_from[:a]:       # selects the desired number of items from both original list
    newlstOne.append(lstOne[i]) # at the same random locations & appends to two newlists in
    newlstTwo.append(lstTwo[i]) # sequence