使用包含随机数的.txt文件和diehard测试套件

时间:2015-10-05 17:07:05

标签: testing random

我有几个.txt文件包含由各种RNG生成的大量整数(大约250万)。我想使用diehard测试套件来测试这些RNG。

.txt文件如下所示:

#==============================================
# generator Park       seed = 1
#=============================================
type: d
count: 2500000
numbit: 32
16807
282475249

其次是更多整数。我使用以下命令使用此.txt文件运行diehard

dieharder -f randdata.txt -a - g 202

我的问题是,我的.txt文件是否正确(特别是前几行),为什么这些行是必需的?我问这个的原因是因为某些RNG生成的每个.txt文件(一些好的,一些坏的)几乎每次测试都会失败,我想知道这是否是因为我在将.txt文件传递给diehard时犯了一些错误。我的RNG一切都很糟糕。

2 个答案:

答案 0 :(得分:9)

是的,该输入文件看起来是正确的。似乎许多dieharder测试都失败了,即使是由顽固分子自己的生成器生成的10M输入:

$ dieharder -o -f example.input -t 10000000 # Generate an input file
$ head -n 10 example.input
#==================================================================
# generator mt19937  seed = 3423143424
#==================================================================
type: d
count: 10000000
numbit: 32
2310531048
 808929469
2423056114
4237891648
$ dieharder -a -g 202 -f example.input 
#=============================================================================#
#            dieharder version 3.31.1 Copyright 2003 Robert G. Brown          #
#=============================================================================#
   rng_name    |           filename             |rands/second|
     file_input|                   example.input|  2.50e+06  |
#=============================================================================#
        test_name   |ntup| tsamples |psamples|  p-value |Assessment
#=============================================================================#
# The file file_input was rewound 1 times
   diehard_birthdays|   0|       100|     100|0.07531570|  PASSED  
# The file file_input was rewound 11 times
      diehard_operm5|   0|   1000000|     100|0.00000000|  FAILED  
# The file file_input was rewound 24 times
  diehard_rank_32x32|   0|     40000|     100|0.00047786|   WEAK   
# The file file_input was rewound 30 times
    diehard_rank_6x8|   0|    100000|     100|0.38082242|  PASSED  
# The file file_input was rewound 32 times
   diehard_bitstream|   0|   2097152|     100|0.56232583|  PASSED  
# The file file_input was rewound 53 times
        diehard_opso|   0|   2097152|     100|0.83072458|  PASSED  

我不确切地知道你需要多少样品才能得到更好的"结果......但只有2.5M数字的故障似乎是预期的。

经过一些实验,似乎测试开始传递大约120MB的随机二进制数据:

$ dd if=/dev/urandom of=/tmp/random bs=4096 count=30000
30000+0 records in
30000+0 records out
122880000 bytes transferred in 10.873818 secs (11300538 bytes/sec)
$ du -sh /tmp/random
117M    /tmp/random
$ dieharder -a -g 201 -f /tmp/random
#=============================================================================#
#            dieharder version 3.31.1 Copyright 2003 Robert G. Brown          #
#=============================================================================#
   rng_name    |           filename             |rands/second|
 file_input_raw|                     /tmp/random|  1.11e+07  |
#=============================================================================#
        test_name   |ntup| tsamples |psamples|  p-value |Assessment
#=============================================================================#
   diehard_birthdays|   0|       100|     100|0.71230346|  PASSED  
# The file file_input_raw was rewound 3 times
      diehard_operm5|   0|   1000000|     100|0.62093817|  PASSED  
# The file file_input_raw was rewound 7 times
  diehard_rank_32x32|   0|     40000|     100|0.02228171|  PASSED  
# The file file_input_raw was rewound 9 times
    diehard_rank_6x8|   0|    100000|     100|0.20698623|  PASSED  
# The file file_input_raw was rewound 10 times
   diehard_bitstream|   0|   2097152|     100|0.55567887|  PASSED  
# The file file_input_raw was rewound 17 times
        diehard_opso|   0|   2097152|     100|0.20799917|  PASSED  

对应于122,880,000 / 4 = 30,720,000 - 所以约31M整数。

答案 1 :(得分:1)

原因是对于某些测试可能需要非常大量的数据,假设它是 X。如果您的文件大小

“文件file_input_raw被倒带3次”

这意味着测试需要一个大 3 倍的文件。否则测试只是测试你的文件 3 次,所以显然有很多重复,熵降低了很多