正确设置随机种子以确保可重复性

时间:2018-08-17 10:46:30

标签: fortran gfortran fortran90 intel-fortran

使用Fortran 90子例程random_seed设置随机种子的方法非常简单。

call random_seed( put=seed )

但是我找不到有关设置种子准则的任何信息(当您需要可重复性时,这是绝对必要的)。我过去听过的民俗学建议标量种子应该很大。例如。 123456789比123更好。我可以在网上找到对此的唯一支持是,建议对ifort扩展功能ran()使用一个"large, odd integer value"

我了解这可能是特定于实现的,并且正在使用gfortran 4.8.5,但也对ifort和(如果可能)与实现无关的一般准则感兴趣。这是一些示例代码:

# for compactness, assume seed size of 4, but it will depend on 
# the implementation (e.g. for my version of gfortran 4.8.5 it is 12)

seed1(1:4) = [ 123456789, 987654321, 456789123, 7891234567 ]
seed2(1:4) =   123456789
seed3(1:4) = [         1,         2,         3,          4 ]

我猜想seed1很好,但是如果您手动设置(如我一样),则相当冗长,因为种子长度可以是12或33或其他任何值。而且我什至不确定这是否还好,因为我根本找不到关于设置这些种子的任何指南。即就我所知,这些种子应该是负数,或者是3位数的偶数,等等。尽管我猜您希望实现会警告您这一点(?)。

seed2seed3显然更方便设置,而且我所知道的都一样。 @Ross建议seed2在这里的答案中实际上是可以的:Random number generator (RNG/PRNG) that returns updated value of seed

因此,我的总结问题是:如何正确设置种子? seed1seed3中的任何一个或全部可接受吗?

2 个答案:

答案 0 :(得分:5)

设置种子的准则取决于RANDOM_NUMBER使用的PRNG算法,但总的来说,您提供的“熵”越多越好。

如果只有一个标量值,则可以使用一些简单的PRNG将其扩展为RANDOM_SEED所需的完整种子数组。参见例如https://gcc.gnu.org/onlinedocs/gcc-4.9.1/gfortran/RANDOM_005fSEED.html

中示例代码中的函数lcg

当前版本的GFortran具有一些针对不良种子的保护措施,应该相对不受“哑”种子的影响(例如seed(:)的所有值都相同,或者所有值很小甚至为零),但具有移植性。遵循我上面建议的其他编译器可能仍然是一个好主意。

答案 1 :(得分:2)

您提供给random_seed( put=... )的内容用于确定生成器的启动状态,该启动状态(如詹尼布状态)应具有尽可能合理的熵。您可以构建一些相对复杂的方法来生成该熵-从系统中抓取某种方式是一个不错的选择。 janneb链接代码就是一个很好的例子。

但是,如果有必要,我通常希望能够从给定的种子复制一次运行。这对于调试和回归测试很有用。然后,对于生产运行,代码可以以某种方式“随机”提取单个种子。因此,我想从一个“种子”中获得良好的RNG。以我的经验,这是很容易实现的,方法是提供单个种子,然后让生成器通过生成数字来增加熵。考虑以下示例:

program main
   implicit none
   integer, parameter :: wp = selected_real_kind(15,307)
   integer, parameter :: n_discard = 100

   integer :: state_size, i
   integer, allocatable, dimension(:) :: state
   real(wp) :: ran, oldran

   call random_seed( size=state_size )
   write(*,*) '-- state size is: ', state_size

   allocate(state(state_size))

   ! -- Simple method of initializing seed from single scalar
   state = 20180815
   call random_seed( put=state )

   ! -- 'Prime' the generator by pulling the first few numbers
   ! -- In reality, these would be discarded but I will print them for demonstration
   ran = 0.5_wp
   do i=1,n_discard
      oldran = ran
      call random_number(ran)
      write(*,'(a,i3,2es26.18)') 'iter, ran, diff: ', i, ran, ran-oldran
   enddo

   ! Now the RNG is 'ready'
end program main

在这里,我给一个种子,然后生成一个随机数100次。通常,我会丢弃这些可能会损坏的初始数字。在此示例中,我正在打印它们以查看它们是否看起来是非随机的。在PGI 15.10上运行:

enet-mach5% pgfortran --version

pgfortran 15.10-0 64-bit target on x86-64 Linux -tp sandybridge 
The Portland Group - PGI Compilers and Tools
Copyright (c) 2015, NVIDIA CORPORATION.  All rights reserved.
enet-mach5% pgfortran main.f90 && ./a.out
 -- state size is:            34
iter, ran, diff:   1  8.114813341476008191E-01  3.114813341476008191E-01
iter, ran, diff:   2  8.114813341476008191E-01  0.000000000000000000E+00
iter, ran, diff:   3  8.114813341476008191E-01  0.000000000000000000E+00
iter, ran, diff:   4  8.114813341476008191E-01  0.000000000000000000E+00
iter, ran, diff:   5  8.114813341476008191E-01  0.000000000000000000E+00
iter, ran, diff:   6  2.172220012214012286E-01 -5.942593329261995905E-01
iter, ran, diff:   7  2.172220012214012286E-01  0.000000000000000000E+00
iter, ran, diff:   8  2.172220012214012286E-01  0.000000000000000000E+00
iter, ran, diff:   9  2.172220012214012286E-01  0.000000000000000000E+00
iter, ran, diff:  10  2.172220012214012286E-01  0.000000000000000000E+00
iter, ran, diff:  11  6.229626682952016381E-01  4.057406670738004095E-01
iter, ran, diff:  12  6.229626682952016381E-01  0.000000000000000000E+00
iter, ran, diff:  13  6.229626682952016381E-01  0.000000000000000000E+00
iter, ran, diff:  14  6.229626682952016381E-01  0.000000000000000000E+00
iter, ran, diff:  15  6.229626682952016381E-01  0.000000000000000000E+00
iter, ran, diff:  16  2.870333536900204763E-02 -5.942593329261995905E-01
iter, ran, diff:  17  2.870333536900204763E-02  0.000000000000000000E+00
iter, ran, diff:  18  4.344440024428024572E-01  4.057406670738004095E-01
iter, ran, diff:  19  4.344440024428024572E-01  0.000000000000000000E+00
iter, ran, diff:  20  4.344440024428024572E-01  0.000000000000000000E+00
iter, ran, diff:  21  8.401846695166028667E-01  4.057406670738004095E-01
iter, ran, diff:  22  8.401846695166028667E-01  0.000000000000000000E+00
iter, ran, diff:  23  6.516660036642036857E-01 -1.885186658523991809E-01
iter, ran, diff:  24  6.516660036642036857E-01  0.000000000000000000E+00
iter, ran, diff:  25  6.516660036642036857E-01  0.000000000000000000E+00
iter, ran, diff:  26  5.740667073800409526E-02 -5.942593329261995905E-01
iter, ran, diff:  27  5.740667073800409526E-02  0.000000000000000000E+00
iter, ran, diff:  28  2.746286719594053238E-01  2.172220012214012286E-01
iter, ran, diff:  29  2.746286719594053238E-01  0.000000000000000000E+00
iter, ran, diff:  30  2.746286719594053238E-01  0.000000000000000000E+00
iter, ran, diff:  31  6.803693390332057334E-01  4.057406670738004095E-01
iter, ran, diff:  32  6.803693390332057334E-01  0.000000000000000000E+00
iter, ran, diff:  33  3.033320073284073715E-01 -3.770373317047983619E-01
iter, ran, diff:  34  3.033320073284073715E-01  0.000000000000000000E+00
iter, ran, diff:  35  7.090726744022077810E-01  4.057406670738004095E-01
iter, ran, diff:  36  1.148133414760081905E-01 -5.942593329261995905E-01
iter, ran, diff:  37  1.148133414760081905E-01  0.000000000000000000E+00
iter, ran, diff:  38  1.435166768450102381E-01  2.870333536900204763E-02
iter, ran, diff:  39  1.435166768450102381E-01  0.000000000000000000E+00
iter, ran, diff:  40  3.607386780664114667E-01  2.172220012214012286E-01
iter, ran, diff:  41  7.664793451402118762E-01  4.057406670738004095E-01
iter, ran, diff:  42  7.664793451402118762E-01  0.000000000000000000E+00
iter, ran, diff:  43  2.009233475830143334E-01 -5.655559975571975428E-01
iter, ran, diff:  44  2.009233475830143334E-01  0.000000000000000000E+00
iter, ran, diff:  45  6.353673500258167905E-01  4.344440024428024572E-01
iter, ran, diff:  46  4.110801709961720007E-02 -5.942593329261995905E-01
iter, ran, diff:  47  4.110801709961720007E-02  0.000000000000000000E+00
iter, ran, diff:  48  8.812926866162200668E-01  8.401846695166028667E-01
iter, ran, diff:  49  8.812926866162200668E-01  0.000000000000000000E+00
iter, ran, diff:  50  9.386993573542241620E-01  5.740667073800409526E-02
iter, ran, diff:  51  3.444400244280245715E-01 -5.942593329261995905E-01
iter, ran, diff:  52  7.501806915018249811E-01  4.057406670738004095E-01
iter, ran, diff:  53  9.961060280922282573E-01  2.459253365904032762E-01
iter, ran, diff:  54  9.961060280922282573E-01  0.000000000000000000E+00
iter, ran, diff:  55  8.221603419923440015E-02 -9.138899938929938571E-01
iter, ran, diff:  56  4.879567012730348097E-01  4.057406670738004095E-01
iter, ran, diff:  57  1.109193695682364478E-01 -3.770373317047983619E-01
iter, ran, diff:  58  7.625853732324401335E-01  6.516660036642036857E-01
iter, ran, diff:  59  7.625853732324401335E-01  0.000000000000000000E+00
iter, ran, diff:  60  2.831393817822487335E-01 -4.794459914501914000E-01
iter, ran, diff:  61  6.888800488560491431E-01  4.057406670738004095E-01
iter, ran, diff:  62  7.462867195940532383E-01  5.740667073800409526E-02
iter, ran, diff:  63  8.036933903320573336E-01  5.740667073800409526E-02
iter, ran, diff:  64  8.036933903320573336E-01  0.000000000000000000E+00
iter, ran, diff:  65  1.644320683984688003E-01 -6.392613219335885333E-01
iter, ran, diff:  66  5.701727354722692098E-01  4.057406670738004095E-01
iter, ran, diff:  67  6.849860769482774003E-01  1.148133414760081905E-01
iter, ran, diff:  68  1.481334147600819051E-01 -5.368526621881954952E-01
iter, ran, diff:  69  5.538740818338823146E-01  4.057406670738004095E-01
iter, ran, diff:  70  1.605380964906970576E-01 -3.933359853431852571E-01
iter, ran, diff:  71  5.662787635644974671E-01  4.057406670738004095E-01
iter, ran, diff:  72  7.672021111475118005E-01  2.009233475830143334E-01
iter, ran, diff:  73  6.360901160331167148E-01 -1.311119951143950857E-01
iter, ran, diff:  74  6.647934514021187624E-01  2.870333536900204763E-02
iter, ran, diff:  75  9.231234697231371911E-01  2.583300183210184287E-01
iter, ran, diff:  76  3.288641367969376006E-01 -5.942593329261995905E-01
iter, ran, diff:  77  5.034149292976053403E-02 -2.785226438671770666E-01
iter, ran, diff:  78  3.249701648891658579E-01  2.746286719594053238E-01
iter, ran, diff:  79  4.110801709961720007E-01  8.611000610700614288E-02
iter, ran, diff:  80  7.268168600551945246E-01  3.157366890590225239E-01
iter, ran, diff:  81  1.325575271289949342E-01 -5.942593329261995905E-01
iter, ran, diff:  82  2.147735613282293343E-01  8.221603419923440015E-02
iter, ran, diff:  83  8.951429003614350677E-01  6.803693390332057334E-01
iter, ran, diff:  84  9.606624794444940107E-02 -7.990766524169856666E-01
iter, ran, diff:  85  8.749502748152764298E-01  7.788840268708270287E-01
iter, ran, diff:  86  6.864316089628772488E-01 -1.885186658523991809E-01
iter, ran, diff:  87  3.753116578189263919E-01 -3.111199511439508569E-01
iter, ran, diff:  88  4.614216639259325348E-01  8.611000610700614288E-02
iter, ran, diff:  89  8.632683590919612016E-01  4.018466951660286668E-01
iter, ran, diff:  90  5.110403908483931446E-01 -3.522279682435680570E-01
iter, ran, diff:  91  3.512250603649960112E-01 -1.598153304833971333E-01
iter, ran, diff:  92  2.984351275420635830E-01 -5.278993282293242828E-02
iter, ran, diff:  93  7.902858007228701354E-01  4.918506731808065524E-01
iter, ran, diff:  94  9.136098520217217356E-01  1.233240512988516002E-01
iter, ran, diff:  95  8.360105557375590024E-01 -7.759929628416273317E-02
iter, ran, diff:  96  7.623052313611680120E-01 -7.370532437639099044E-02
iter, ran, diff:  97  2.525198759725810760E-02 -7.370532437639099044E-01
iter, ran, diff:  98  9.228433278518650695E-01  8.975913402546069619E-01
iter, ran, diff:  99  1.283834133499510699E-01 -7.944599145019139996E-01
iter, ran, diff: 100  7.311534560989940701E-01  6.027700427490430002E-01

生成的前10个数字中有8个重复!这很好地说明了为什么某些发生器首先需要高熵状态。但是,在“一段时间”之后,数字开始看起来合理。

对于我的应用程序,100个左右的随机数花费很小,因此,每当我生成一个生成器时,我都会以这种方式对它们进行初始化。我没有在ifort 16.0,gfortran 4.8或gfortran 8.1上观察到这种明显的不良行为。但是,非重复数字是一个很低的门槛。因此,我将为所有编译器准备就绪,而不仅仅是我观察到不良行为的那些编译器。

从注释中,一些编译器试图通过以某种方式处理输入状态以产生实际内部状态来消除不良行为。 Gfortran使用“异或密码”。在get上进行相反的操作。