我使用来自here(和here)的修改后的素数生成器,并且作者建议只要内部数据与起始内容一致,生成器启动就是任意的值。所以我想提取这些数据并存储起来供以后使用。这是我修改过的生成器:
from itertools import count
def SieveLoadable(level=0,PData={}):
"modified prime sieve from http://stackoverflow.com/a/19391111"
if not PData.get(level):#no data, start from scratch
for i in[2, 3, 5, 7]:
yield i
D,C={},9
ps=SieveLoadable(level+1)
next(ps)
p=next(ps)
psq=p*p
else:#data! load it
D,C,psq,p=PData[level]
ps=SieveLoadable(level+1,PData)
while p<=7:#have next level down skip the values it gave before
p=next(ps)
for i in count(C,2):
if STOP:#set this outside of the generator
#store data in a dict outside the generator
Internals[level]=[D,i,psq,p]
w=next(ps)#call the next level down
# it will hit this if statement and stop before changing its data
break
if i in D:
step=D.pop(i)
elif i<psq:
yield i
continue
else:
step=2*p
p=next(ps)
psq=p*p
i+=step
while i in D:
i+=step
D[i]=step
这在某种程度上有效,但我注意到启动和停止使它跳过一些素数(例如,每100万个素数重新启动它使它跳过32452883和32452909),所以我怎么能这样做它不会跳过一个素数?
以下是我如何调用生成器:
import pickle, os
PrimeFolder='C:\\Primes'
sieveData='\\'.join([PrimeFolder,"Internals.dmp"])
Internals={}
STOP=False
numPrimes = 1e6
if not os.path.exists(PrimeFolder):os.makedirs(PrimeFolder)
if os.path.exists(sieveData):#load from previous run
File=open(sieveData,'rb')
Internals = pickle.load(File)
File.close()
for i,p in enumerate(SieveLoadable(0,Internals)):
#store p in a list here
if not i:print('Starting at: {:,}'.format(p))
if i>=numPrimes:#amount of primes to generate at a time
#dump list of primes to file in this if statement
print('Stopping at: {:,}'.format(p))
STOP=True#stop the generator
File=open(sieveData,'wb')#save for next time
pickle.dump(Internals,File)
File.close()
当我开始使用这个特定的素数生成器时,任何可以转储数据并重新加载以供以后使用的素数生成器都将受到赞赏。
答案 0 :(得分:2)
不考虑你的代码,关于算法的评论:它递归地创建一个素数生成器塔,每个塔达到它上面的生成器的生产点的平方根。
但这主要是为了简化代码。内部质数发生器可以是常规的,非推迟的筛子发生器,如原始ActiveState代码中所示。它无论如何都只达到顶部生成器限制的平方根,并且空间复杂度不会改变,这就是为什么这个代码快捷方式首先是可接受的。代码可以在the test entry on Ideone中看到,正如您引用的my answer中所述。
这样,您只需要两个词典来存储和重新加载。您甚至可以在一个生成器中明确地维护两个词典:
/
/
generator {primes are just produced}
/
/ {top internal dict uses separate supply of primes}
/
internal_loop {each prime produced is added into the loop dict as well}
/ \
\_____________/
这与使用
之间的the difference, in Haskell相同_Y g = g (_Y g) -- recursive tower of generators
和
_Y g = g x -- two-staged production with
where
x = g x -- an internal loop
表示递归创造供应。
答案 1 :(得分:1)
使用Will Ness'建议,我得到了他的素数发生器的两级形式暂停/停止并在任何时候开始;如果它在前4个素数之后没有停止。以下是修改过的发电机。
内部发电机:
from itertools import count
def sieveL(l=0,Data={}):#l is not used, its how the test calls the generators
'''modified from
http://code.activestate.com/recipes/117119-sieve-of-eratosthenes/
L for loadable'''
if Data.get('I',0):#there's data from a previous run, load it
D,C=Data['I'] #I for inner
else:#no data, start from scratch
yield 2
D={}
C=3#starting counter
for i in count(C,2):
if STOP:
Internals['I']=[D,i]#store the current counter and the internal data
break
s=D.pop(i, 0)
if not s:
yield i
D[i*i]=2*i
else:
i+=s
while i in D:i+=s
D[i]=s
外部发电机:
from itertools import count
def postponed_sieveL(level=0,PData={}):
'''uses a seperate internal generator (two level form) - sieveL
modified from
https://stackoverflow.com/a/10733621, https://stackoverflow.com/a/19391111
'''
#this should be fine unless you stop the generator
# before it passes the first four primes
dat=PData.get(level,0)#load any previous data
if not dat:#no previous data, start from 2
for i in [2, 3, 5, 7]:
yield i
D,C={},9
ps=sieveL('I',PData)#inner generator
next(ps)
p=next(ps)
psq=p*p
else:#past the inital primes, load from previous run
D,C,psq,p=PData[level]
ps=sieveL('I',PData)#inner generator
for i in count(C,2):
if STOP:#set this outside of the generator
#dict, current index (count value), squared prime and prime
Internals[level]=[D,i,psq,p]
w=next(ps)#inform the inner generator that it should stop
break
if i in D:
step=D.pop(i)
elif i<psq:
yield i
continue
else:
step=2*p
p=next(ps)
psq=p*p
i+=step
while i in D:
i+=step
D[i]=step
在看这个的时候,我实际上设法将他的递归素数发生器变成可以停止和启动的东西。这不是我喜欢的,但它确实有效。处理那些初始素数比我想象的要难。这是递归形式:
from itertools import count
def postponed_sieveR(level=0,PData={}):
'''recursive form of postponed_sieve
modified from
https://stackoverflow.com/a/10733621
https://stackoverflow.com/a/19391111
'''
dat=PData.get(level,[0,0,0])#load any previous data
#inital data is stored as [0, current_prime, starting_index]
if not dat[0]:# handling the inital primes
init=[2, 3, 5, 7]
p,srt=dat[1:]
i=srt
if p<=7:
for i,p in enumerate(init[srt:]):#start
i+=srt#correct the index
if STOP:
break
yield p
#to prevent getting caught constantly returning 7 after reloads
if p==init[-1]:p+=2
if STOP:# store the data
Internals[level]=[0,p,i]
return# no lower levels, so return
D,C={},9
ps=postponed_sieveR(level+1,PData)
next(ps)
p=next(ps)
psq=p*p
else:#past the inital primes, load from previous run
D,C,psq,p=PData[level]
ps=postponed_sieveR(level+1,PData)
for i in count(C,2):
if STOP:#set this outside of the generator
#dict, current index (count value), squared prime and prime
Internals[level]=[D,i,psq,p]
w=next(ps)#call the next level down
#(it will hit this if statement and stop before changing its data)
break
if i in D:
step=D.pop(i)
elif i<psq:
yield i
continue
else:
step=2*p
p=next(ps)
psq=p*p
i+=step
while i in D:
i+=step
D[i]=step
有趣的是,递归形式使用了两个级别形式的大约一半的空间。我最初的想法是,他们的情况大致相同,但现在考虑它是有道理的。递归形式是一堆字典,存储它们所需的平方根量,而两级形式是线性量和平方根量。这是我用来测试生成器及其结果的代码。我可能已经过度使用格式来制作精美的表格。
import os
import pickle
from time import time
Internals={} #generator's internal data
STOP=False # flag for stopping the generator
Max=10639 # No. primes to generate per segment
segments=83 # No. segments to unload, reload the data from the generator
# name : generator
generators={'1 sieveL':sieveL,
'2 postponed_sieveL - two level form':postponed_sieveL,
'3 postponed_sieveR - recursive form':postponed_sieveR,
}
print 'Doing {:,} segment{}, each generating {:,} prime{} ({:,} prime{})\n'.format(segments,''if segments==1 else's',
Max,''if Max==1 else's',Max*segments,''if Max*segments==1 else's')
#find sum of primes of a non paused generator for comparison
Pcom=0
t1=time()
for i,k in enumerate(postponed_sieveR()):
Pcom+=k
if i==(Max*segments)-1:
break
del_t=time()-t1
NamCen=max([len(i)for i in generators.keys()])
col_1='Generator Name'.center(NamCen)
col_2='Sum of all generated primes'
col_3='Data size (Bytes)'
col_4='Time (Seconds)'
Size='N/A'
# table and non paused generator
print(' | '.join([col_1,col_2,col_3,col_4]))
print(' | '.join(['-'*len(col_1),'-'*len(col_2),'-'*len(col_3),'-'*len(col_4)]))
print(' | '.join(['0 Non-paused sieve'.ljust(len(col_1)),'{:,}'.format(Pcom).center(len(col_2)),
Size.center(len(col_3)),'{:06.03f}'.format(del_t).center(len(col_4))]))
for name,gen in sorted(generators.items()):
Psum=0#reset the sum and the data storage for the next sieve
Internals={}
t1=time()
#print('\nstarting {}'.format(name))
for segs in range(segments):
for i,j in enumerate(gen(0,Internals)):
Psum+=j
if i==Max-1:
STOP=True
STOP=False
#print('\tsegment {}; stopped at {}'.format(segs,j))
del_t=time()-t1
# dump data (this section can be commented out without issues)
DataPath="C:\\data.prime"
Data=open(DataPath,'wb')
pickle.dump(Internals,Data)
Data.close()
Size=os.path.getsize(DataPath)# find size of data dump after last segment
os.remove(DataPath)# then remove it, data was only dumped to find file size
# show stats
print(' | '.join([name.ljust(len(col_1)),'{:,}'.format(Psum).center(len(col_2)),
(Size if type(Size)==str else '{:,}'.format(Size)).center(len(col_3)),
'{:06.03f}'.format(del_t).center(len(col_4))]))
及其输出:
Doing 83 segments, each generating 10,639 primes (883,037 primes)
Generator Name | Sum of all generated primes | Data size (Bytes) | Time (Seconds)
----------------------------------- | --------------------------- | ----------------- | --------------
0 Non-paused sieve | 5,774,833,097,284 | N/A | 03.114
1 sieveL | 5,774,833,097,284 | 24,195,100 | 04.219
2 postponed_sieveL - two level form | 5,774,833,097,284 | 16,618 | 03.175
3 postponed_sieveR - recursive form | 5,774,833,097,284 | 8,988 | 03.057
年+编辑: 我已经能够制作一个更好的版本的保存递归生成器:
from itertools import count
def Generator(level=0, PData={}):
'''prime generator that can save its internal data
Refs: https://stackoverflow.com/a/10733621
https://stackoverflow.com/a/19391111
https://stackoverflow.com/a/23258396'''
# this version works if you don't stop before the first 4 primes
dat=PData.get(level, 0)
if not dat:# handling the initial primes
for p in [2, 3, 5, 7]:
yield p
if STOP:return#lowest level has nothing to store so return
D,c={},9
ps=Generator(level+1, PData)
next(ps);p=next(ps)
psq=p*p
else: # past the initial primes, load from previous run
D,c,p=PData[level]
psq=p*p # need the squared prime, it was not stored
g=0 # correction factor
ps=Generator(level+1,PData)
#if p<=7, its the lowest level, so skip previous values it gave before.
while g<p and p<=7:g=next(ps)
#all primes after initial set
for i in count(c, 2):
if STOP:
Internals[level]=[D,i,p]#store dict, current value and prime
next(ps)#call the next level down
#it will hit this if statement and stop before changing its data
break
step=D.pop(i, 0)
if not step:
if i<psq:
yield i
continue
else:
step=2*p
p=next(ps)
psq=p*p
i+=step
while i in D:
i+=step
D[i]=step
我没有尝试处理那些初始素数,而是忽略它们。它的效果更好,我认为更清洁。在重新加载数据时,我将它重置为以前的最低级别。以下是与上述内容的比较:
Doing 83 segments, each generating 10,639 primes (883,037 primes)
Generator Name | Sum of all generated primes | Data size (Bytes) | Time (Seconds)
----------------------------------- | --------------------------- | ----------------- | --------------
0 Non-paused sieve | 5,774,833,097,284 | N/A | 02.923
1 sieveL | 5,774,833,097,284 | 24,195,100 | 04.169
2 postponed_sieveL - two level form | 5,774,833,097,284 | 16,618 | 03.151
3 postponed_sieveR - recursive form | 5,774,833,097,284 | 8,988 | 03.007
4 Generator | 5,774,833,097,284 | 8,938 | 03.038