输入列表可以超过100万个数字。当我使用较小的'重复'运行以下代码时,它很好;
def sample(x):
length = 1000000
new_array = random.sample((list(x)),length)
return (new_array)
def repeat_sample(x):
i = 0
repeats = 100
list_of_samples = []
for i in range(repeats):
list_of_samples.append(sample(x))
return(list_of_samples)
repeat_sample(large_array)
但是,使用高重复项(例如上面的100)会产生MemoryError
。回溯如下;
Traceback (most recent call last):
File "C:\Python31\rnd.py", line 221, in <module>
STORED_REPEAT_SAMPLE = repeat_sample(STORED_ARRAY)
File "C:\Python31\rnd.py", line 129, in repeat_sample
list_of_samples.append(sample(x))
File "C:\Python31\rnd.py", line 121, in sample
new_array = random.sample((list(x)),length)
File "C:\Python31\lib\random.py", line 309, in sample
result = [None] * k
MemoryError
我假设我的内存不足。我不知道如何解决这个问题。
感谢您的时间!
答案 0 :(得分:5)
扩展我的评论:
假设您对每个样本所做的处理是计算其均值。
def mean(samplelists):
means = []
n = float(len(samplelists[0]))
for sample in samplelists:
mean = sum(sample)/n
means.append(mean)
return means
calc_means(repeat_sample(large_array))
这会让你在内存中保留所有这些列表。你可以这样轻得多:
def mean(sample, n):
n = float(n)
mean = sum(sample)/n
return mean
def sample(x):
length = 1000000
new_array = random.sample(x, length)
return new_array
def repeat_means(x):
repeats = 100
list_of_means = []
for i in range(repeats):
list_of_means.append(mean(sample(x)))
return list_of_means
repeat_means(large_array)
但这仍然不够好......只有构建你的结果列表才能做到这一切:
import random
def sampling_mean(population, k, times):
# Part of this is lifted straight from random.py
_int = int
_random = random.random
n = len(population)
kf = float(k)
result = []
if not 0 <= k <= n:
raise ValueError, "sample larger than population"
for t in range(times):
selected = set()
sum_ = 0
selected_add = selected.add
for i in xrange(k):
j = _int(_random() * n)
while j in selected:
j = _int(_random() * n)
selected_add(j)
sum_ += population[j]
mean = sum_/kf
result.append(mean)
return result
sampling_mean(x, 1000000, 100)
现在,您的算法可以像这样精简吗?
答案 1 :(得分:4)
两个答案:
除非您使用旧机器,否则您实际上不太可能耗尽内存。你得到一个MemoryError
,因为你可能正在使用一个32位的Python版本,并且你不能分配超过2GB的内存。
你的做法是错误的。您应该使用随机样本生成器而不是构建样本列表。
答案 2 :(得分:1)
random.sample()的生成器版本也会有所帮助:
from random import random
from math import ceil as _ceil, log as _log
def xsample(population, k):
"""A generator version of random.sample"""
n = len(population)
if not 0 <= k <= n:
raise ValueError("sample larger than population")
_int = int
setsize = 21 # size of a small set minus size of an empty list
if k > 5:
setsize += 4 ** _ceil(_log(k * 3, 4)) # table size for big sets
if n <= setsize or hasattr(population, "keys"):
# An n-length list is smaller than a k-length set, or this is a
# mapping type so the other algorithm wouldn't work.
pool = list(population)
for i in range(k): # invariant: non-selected at [0,n-i)
j = _int(random() * (n-i))
yield pool[j]
pool[j] = pool[n-i-1] # move non-selected item into vacancy
else:
try:
selected = set()
selected_add = selected.add
for i in range(k):
j = _int(random() * n)
while j in selected:
j = _int(random() * n)
selected_add(j)
yield population[j]
except (TypeError, KeyError): # handle (at least) sets
if isinstance(population, list):
raise
for x in sample(tuple(population), k):
yield x
答案 3 :(得分:0)
您可以做的唯一改进是将代码更改为:
list_of_samples = [random.sample(x, length) for _ in range(repeats)]
然而,这不会改变你无法在现实世界中创建任意长度列表的事实。
答案 4 :(得分:0)
您可以尝试使用数组对象http://docs.python.org/py3k/library/array.html。它应该比列表更有效,但可能更难使用。