我有一个基本上如此的列表:
Dalc = [1,2,1,1,1,1,1,1,1,1,1,1,5,1,1,3,1,2,1,1,1,1,2.......]
它目前包含395个元素,我试图对它进行扩展,以便保持相同的百分比1,2,3,4,4' s和5&。 Min = 1,Max = 5,我最初做了以下尝试将列表扩展到10000个元素:
from random import randint
....
Dalc_add = []
dalc_max = max(Dalc)
dalc_min = min(Dalc)
i = 0
while i < 10000:
Dalc_add.append(randint(dalc_min, dalc_max))
i = i + 1
Dalc.append(Dalc_add)
这给出了一个列表,其中包含了前395次迭代的初始偏差,但之后列表的其余部分看起来像:
[1,5,3,2,3,1,4,2,4,5,2,5,3,2,1,3,4,2,1,3,3,4,1........]
更多3,4和&amp; 5现在,它完全搞砸了我可以执行的任何统计分析。
如何扩展上面的列表,同时还保留列表值的重量和偏差(关于出现频率)?
答案 0 :(得分:2)
您可以使用numpy.random.choice
。这是从原始列表中随机抽样的。如果您将其提供给原始列表,则无需使用权重:
import numpy as np
Dalc = [1,2,1,1,1,1,1,1,1,1,1,1,5,1,1,3,1,2,1,1,1,1,2]
new_choices = np.random.choice(Dalc, size=10000)
Dalc += list(new_choices)
答案 1 :(得分:1)
您有两种选择:
from random import choices
Dalc.extend(choices(Dalc, k=numTimes))
或
from numpy.random import choice
Dalc.extend(choice(Dalc, size=numTimes))
这是从Dalc
numTimes
次随机选择的,这显然会使你的权重保持不变。
您应该使用哪种方法取决于两件事,numTimes
是否很大以及Dalc
是否很大。使用timeit
:
import timeit
print('Standard | Numpy')
print(timeit.timeit('choices([1,2,3,4,5], k=10000)', setup='from random import choices', number=10000), end=' | ')
print(timeit.timeit('choice([1,2,3,4,5], size=10000)', setup='from numpy.random import choice', number=10000))
print(timeit.timeit('choices([1,2,3,4,5], k=1000)', setup='from random import choices', number=10000), end=' | ')
print(timeit.timeit('choice([1,2,3,4,5], size=1000)', setup='from numpy.random import choice', number=10000))
print(timeit.timeit('choices([1,2,3,4,5], k=100)', setup='from random import choices', number=10000), end=' | ')
print(timeit.timeit('choice([1,2,3,4,5], size=100)', setup='from numpy.random import choice', number=10000))
print(timeit.timeit('choices([1,2,3,4,5], k=10)', setup='from random import choices', number=10000), end=' | ')
print(timeit.timeit('choice([1,2,3,4,5], size=10)', setup='from numpy.random import choice', number=10000))
print(timeit.timeit('choices([1,2,3,4,5], k=5)', setup='from random import choices', number=10000), end=' | ')
print(timeit.timeit('choice([1,2,3,4,5], size=5)', setup='from numpy.random import choice', number=10000))
print()
print(timeit.timeit('choices([1,2,3,4,5]*10000, k=60)', setup='from random import choices', number=10000), end=' | ')
print(timeit.timeit('choice([1,2,3,4,5]*10000, size=60)', setup='from numpy.random import choice', number=10000))
print(timeit.timeit('choices([1,2,3,4,5]*1000, k=60)', setup='from random import choices', number=10000), end=' | ')
print(timeit.timeit('choice([1,2,3,4,5]*1000, size=60)', setup='from numpy.random import choice', number=10000))
print(timeit.timeit('choices([1,2,3,4,5]*100, k=60)', setup='from random import choices', number=10000), end=' | ')
print(timeit.timeit('choice([1,2,3,4,5]*100, size=60)', setup='from numpy.random import choice', number=10000))
print(timeit.timeit('choices([1,2,3,4,5]*10, k=60)', setup='from random import choices', number=10000), end=' | ')
print(timeit.timeit('choice([1,2,3,4,5]*10, size=60)', setup='from numpy.random import choice', number=10000))
print(timeit.timeit('choices([1,2,3,4,5], k=60)', setup='from random import choices', number=10000), end=' | ')
print(timeit.timeit('choice([1,2,3,4,5], size=60)', setup='from numpy.random import choice', number=10000))
给我们输出:
Standard | Numpy
25.372834796129872 | 1.8409739351390613
2.5144703081718696 | 0.316072358469512
0.2527455696737988 | 0.15912525398981003
0.03453532081119093 | 0.13720956183202304
0.021838018317897223 | 0.1544090297115197
1.2724984282899072 | 26.585005448108767
0.29600333450513006 | 2.7196871458182343
0.16926004909861803 | 0.4086584816186516
0.14861485298857957 | 0.16870138091688602
0.15621485532244606 | 0.1448146694886887
因此,如果numTimes非常大,Numpy是明显的赢家,但如果Dalc
的大小非常大,那么似乎可以使用vanilla python。