可原生素发电机

时间:2014-04-17 00:28:46

标签: python primes

我使用来自here(和here)的修改后的素数生成器,并且作者建议只要内部数据与起始内容一致,生成器启动就是任意的值。所以我想提取这些数据并存储起来供以后使用。这是我修改过的生成器:

from itertools import count
def SieveLoadable(level=0,PData={}):
    "modified prime sieve from http://stackoverflow.com/a/19391111"
    if not PData.get(level):#no data, start from scratch
        for i in[2, 3, 5, 7]:
            yield i
        D,C={},9
        ps=SieveLoadable(level+1)
        next(ps)
        p=next(ps)
        psq=p*p

    else:#data! load it
        D,C,psq,p=PData[level]
        ps=SieveLoadable(level+1,PData)
        while p<=7:#have next level down skip the values it gave before
            p=next(ps)

    for i in count(C,2):
        if STOP:#set this outside of the generator
        #store data in a dict outside the generator
            Internals[level]=[D,i,psq,p]
            w=next(ps)#call the next level down
            # it will hit this if statement and stop before changing its data
            break

        if i in D:
            step=D.pop(i)
        elif i<psq:
            yield i
            continue
        else:
            step=2*p
            p=next(ps)
            psq=p*p

        i+=step
        while i in D:
            i+=step
        D[i]=step

这在某种程度上有效,但我注意到启动和停止使它跳过一些素数(例如,每100万个素数重新启动它使它跳过32452883和32452909),所以我怎么能这样做它不会跳过一个素数?

以下是我如何调用生成器:

import pickle, os
PrimeFolder='C:\\Primes'
sieveData='\\'.join([PrimeFolder,"Internals.dmp"])
Internals={}
STOP=False
numPrimes = 1e6

if not os.path.exists(PrimeFolder):os.makedirs(PrimeFolder)

if os.path.exists(sieveData):#load from previous run
    File=open(sieveData,'rb')
    Internals = pickle.load(File)
    File.close()

for i,p in enumerate(SieveLoadable(0,Internals)):
    #store p in a list here

    if not i:print('Starting at: {:,}'.format(p))
    if i>=numPrimes:#amount of primes to generate at a time
        #dump list of primes to file in this if statement

        print('Stopping at: {:,}'.format(p))
        STOP=True#stop the generator

File=open(sieveData,'wb')#save for next time
pickle.dump(Internals,File)
File.close()

当我开始使用这个特定的素数生成器时,任何可以转储数据并重新加载以供以后使用的素数生成器都将受到赞赏。

2 个答案:

答案 0 :(得分:2)

不考虑你的代码,关于算法的评论:它递归地创建一个素数生成器塔,每个塔达到它上面的生成器的生产点的平方根。

但这主要是为了简化代码。内部质数发生器可以是常规的,非推迟的筛子发生器,如原始ActiveState代码中所示。它无论如何都只达到顶部生成器限制的平方根,并且空间复杂度不会改变,这就是为什么这个代码快捷方式首先是可接受的。代码可以在the test entry on Ideone中看到,正如您引用的my answer中所述。

这样,您只需要两个词典来存储和重新加载。您甚至可以在一个生成器中明确地维护两个词典

                        /
                       /
              generator          {primes are just produced}
             /
            /              {top internal dict uses separate supply of primes}
           /
   internal_loop       {each prime produced is added into the loop dict as well}
  /             \
  \_____________/

这与使用

之间的the difference, in Haskell相同
_Y g = g (_Y g)      -- recursive tower of generators

_Y g = g x           -- two-staged production with 
    where  
         x = g x     --   an internal loop

表示递归创造供应。

答案 1 :(得分:1)

使用Will Ness'建议,我得到了他的素数发生器的两级形式暂停/停止并在任何时候开始;如果它在前4个素数之后没有停止。以下是修改过的发电机。

内部发电机:

from itertools import count
def sieveL(l=0,Data={}):#l is not used, its how the test calls the generators
    '''modified from 
    http://code.activestate.com/recipes/117119-sieve-of-eratosthenes/
    L for loadable'''


    if Data.get('I',0):#there's data from a previous run, load it
        D,C=Data['I'] #I for inner

    else:#no data, start from scratch
        yield 2 
        D={}
        C=3#starting counter

    for i in count(C,2):
        if STOP:
            Internals['I']=[D,i]#store the current counter and the internal data
            break
        s=D.pop(i, 0)
        if not s:
            yield i
            D[i*i]=2*i
        else:
            i+=s
            while i in D:i+=s
            D[i]=s

外部发电机:

from itertools import count
def postponed_sieveL(level=0,PData={}):
    '''uses a seperate internal generator (two level form) - sieveL
    modified from 
    https://stackoverflow.com/a/10733621, https://stackoverflow.com/a/19391111
    '''
    #this should be fine unless you stop the generator 
    #  before it passes the first four primes

    dat=PData.get(level,0)#load any previous data
    if not dat:#no previous data, start from 2
        for i in [2, 3, 5, 7]:
            yield i

        D,C={},9
        ps=sieveL('I',PData)#inner generator
        next(ps)
        p=next(ps)
        psq=p*p

    else:#past the inital primes, load from previous run  
        D,C,psq,p=PData[level]
        ps=sieveL('I',PData)#inner generator

    for i in count(C,2):
        if STOP:#set this outside of the generator

        #dict, current index (count value), squared prime and prime
            Internals[level]=[D,i,psq,p]

            w=next(ps)#inform the inner generator that it should stop
            break

        if i in D:
            step=D.pop(i)
        elif i<psq:
            yield i
            continue
        else:
            step=2*p
            p=next(ps)
            psq=p*p
        i+=step
        while i in D:
            i+=step
        D[i]=step

在看这个的时候,我实际上设法将他的递归素数发生器变成可以停止和启动的东西。这不是我喜欢的,但它确实有效。处理那些初始素数比我想象的要难。这是递归形式:

from itertools import count
def postponed_sieveR(level=0,PData={}):
    '''recursive form of postponed_sieve
    modified from 
    https://stackoverflow.com/a/10733621
    https://stackoverflow.com/a/19391111
    '''
    dat=PData.get(level,[0,0,0])#load any previous data
    #inital data is stored as [0, current_prime, starting_index]

    if not dat[0]:# handling the inital primes
        init=[2, 3, 5, 7]
        p,srt=dat[1:]
        i=srt

        if p<=7:
            for i,p in enumerate(init[srt:]):#start
                i+=srt#correct the index
                if STOP:
                    break
                yield p

                #to prevent getting caught constantly returning 7 after reloads
                if p==init[-1]:p+=2

        if STOP:# store the data
            Internals[level]=[0,p,i]
            return# no lower levels, so return

        D,C={},9
        ps=postponed_sieveR(level+1,PData)
        next(ps)
        p=next(ps)
        psq=p*p

    else:#past the inital primes, load from previous run  
        D,C,psq,p=PData[level]
        ps=postponed_sieveR(level+1,PData)

    for i in count(C,2):

        if STOP:#set this outside of the generator

            #dict, current index (count value), squared prime and prime
            Internals[level]=[D,i,psq,p]
            w=next(ps)#call the next level down 
            #(it will hit this if statement and stop before changing its data)
            break

        if i in D:
            step=D.pop(i)
        elif i<psq:
            yield i
            continue
        else:
            step=2*p
            p=next(ps)
            psq=p*p

        i+=step
        while i in D:
            i+=step
        D[i]=step

有趣的是,递归形式使用了两个级别形式的大约一半的空间。我最初的想法是,他们的情况大致相同,但现在考虑它是有道理的。递归形式是一堆字典,存储它们所需的平方根量,而两级形式是线性量和平方根量。这是我用来测试生成器及其结果的代码。我可能已经过度使用格式来制作精美的表格。

import os
import pickle
from time import time

Internals={} #generator's internal data
STOP=False # flag for stopping the generator

Max=10639 # No. primes to generate per segment
segments=83 # No. segments to unload, reload the data from the generator

            # name  : generator
generators={'1 sieveL':sieveL,
            '2 postponed_sieveL - two level form':postponed_sieveL,
            '3 postponed_sieveR - recursive form':postponed_sieveR,
            }

print 'Doing {:,} segment{}, each generating {:,} prime{} ({:,} prime{})\n'.format(segments,''if segments==1 else's',
                                                                      Max,''if Max==1 else's',Max*segments,''if Max*segments==1 else's')
#find sum of primes of a non paused generator for comparison
Pcom=0
t1=time()
for i,k in enumerate(postponed_sieveR()):
    Pcom+=k
    if i==(Max*segments)-1:
        break
del_t=time()-t1

NamCen=max([len(i)for i in generators.keys()])
col_1='Generator Name'.center(NamCen)
col_2='Sum of all generated primes'
col_3='Data size (Bytes)'
col_4='Time (Seconds)'
Size='N/A'

# table and non paused generator
print(' | '.join([col_1,col_2,col_3,col_4]))
print(' | '.join(['-'*len(col_1),'-'*len(col_2),'-'*len(col_3),'-'*len(col_4)])) 
print(' | '.join(['0 Non-paused sieve'.ljust(len(col_1)),'{:,}'.format(Pcom).center(len(col_2)),
                  Size.center(len(col_3)),'{:06.03f}'.format(del_t).center(len(col_4))]))

for name,gen in sorted(generators.items()):
    Psum=0#reset the sum and the data storage for the next sieve
    Internals={}
    t1=time()

    #print('\nstarting {}'.format(name))
    for segs in range(segments):
        for i,j in enumerate(gen(0,Internals)):
            Psum+=j
            if i==Max-1:
                STOP=True
        STOP=False
        #print('\tsegment {}; stopped at {}'.format(segs,j))
    del_t=time()-t1

    # dump data (this section can be commented out without issues)
    DataPath="C:\\data.prime"
    Data=open(DataPath,'wb')
    pickle.dump(Internals,Data)
    Data.close()
    Size=os.path.getsize(DataPath)# find size of data dump after last segment
    os.remove(DataPath)# then remove it, data was only dumped to find file size

    # show stats
    print(' | '.join([name.ljust(len(col_1)),'{:,}'.format(Psum).center(len(col_2)),
                     (Size if type(Size)==str else '{:,}'.format(Size)).center(len(col_3)),
                      '{:06.03f}'.format(del_t).center(len(col_4))]))

及其输出:

Doing 83 segments, each generating 10,639 primes (883,037 primes)

           Generator Name           | Sum of all generated primes | Data size (Bytes) | Time (Seconds)
----------------------------------- | --------------------------- | ----------------- | --------------
0 Non-paused sieve                  |      5,774,833,097,284      |        N/A        |     03.114    
1 sieveL                            |      5,774,833,097,284      |     24,195,100    |     04.219
2 postponed_sieveL - two level form |      5,774,833,097,284      |       16,618      |     03.175    
3 postponed_sieveR - recursive form |      5,774,833,097,284      |       8,988       |     03.057    

年+编辑: 我已经能够制作一个更好的版本的保存递归生成器:

from itertools import count

def Generator(level=0, PData={}):
    '''prime generator that can save its internal data
    Refs:  https://stackoverflow.com/a/10733621
           https://stackoverflow.com/a/19391111
           https://stackoverflow.com/a/23258396'''
    # this version works if you don't stop before the first 4 primes
    dat=PData.get(level, 0)

    if not dat:# handling the initial primes
        for p in [2, 3, 5, 7]:
            yield p

        if STOP:return#lowest level has nothing to store so return

        D,c={},9
        ps=Generator(level+1, PData)
        next(ps);p=next(ps)
        psq=p*p

    else:       # past the initial primes, load from previous run
        D,c,p=PData[level]
        psq=p*p # need the squared prime, it was not stored
        g=0     # correction factor
        ps=Generator(level+1,PData)

        #if p<=7, its the lowest level, so skip previous values it gave before.
        while g<p and p<=7:g=next(ps)

    #all primes after initial set 
    for i in count(c, 2):

        if STOP:
            Internals[level]=[D,i,p]#store dict, current value and prime
            next(ps)#call the next level down
            #it will hit this if statement and stop before changing its data
            break

        step=D.pop(i, 0)
        if not step:
            if i<psq:
                yield i
                continue
            else:
                step=2*p
                p=next(ps)
                psq=p*p
        i+=step
        while i in D:
            i+=step
        D[i]=step

我没有尝试处理那些初始素数,而是忽略它们。它的效果更好,我认为更清洁。在重新加载数据时,我将它重置为以前的最低级别。以下是与上述内容的比较:

Doing 83 segments, each generating 10,639 primes (883,037 primes)

           Generator Name           | Sum of all generated primes | Data size (Bytes) | Time (Seconds)
----------------------------------- | --------------------------- | ----------------- | --------------
0 Non-paused sieve                  |      5,774,833,097,284      |        N/A        |     02.923    
1 sieveL                            |      5,774,833,097,284      |     24,195,100    |     04.169    
2 postponed_sieveL - two level form |      5,774,833,097,284      |       16,618      |     03.151    
3 postponed_sieveR - recursive form |      5,774,833,097,284      |       8,988       |     03.007    
4 Generator                         |      5,774,833,097,284      |       8,938       |     03.038