我被告知rand()mod n产生有偏差的结果,所以我试着让这段代码来检查它。它会生成从{1}到s
的{{1}}个数字,然后按出现次数进行排序。
l
运行此代码后,我可以发现rand()的预期偏差:
#include <iostream>
#include <random>
using namespace std;
struct vec_struct{
int num;
int count;
double ratio;
};
void num_sort(vec_struct v[], int n){
for (int i = 0; i < n-1; i++){
for (int k = 0; k < n-1-i; k++){
if (v[k].num > v[k+1].num) swap(v[k], v[k+1]);
}
}
}
void count_sort(vec_struct v[], int n){
for (int i = 0; i < n-1; i++){
for (int k = 0; k < n-1-i; k++){
if (v[k].count < v[k+1].count) swap(v[k], v[k+1]);
}
}
}
int main(){
srand(time(0));
random_device rnd;
int s, l, b, c = 1;
cout << "How many numbers to generate? ";
cin >> s;
cout << "Generate " << s << " numbers ranging from 1 to? ";
cin >> l;
cout << "Use rand or mt19937? [1/2] ";
cin >> b;
vec_struct * vec = new vec_struct[s];
mt19937 engine(rnd());
uniform_int_distribution <int> dist(1, l);
if (b == 1){
for (int i = 0; i < s; i++){
vec[i].num = (rand() % l) + 1;
}
} else if (b == 2){
for (int i = 0; i < s; i++){
vec[i].num = dist(engine);
}
}
num_sort(vec, s);
for (int i = 0, j = 0; i < s; i++){
if (vec[i].num == vec[i+1].num){
c++;
} else {
vec[j].num = vec[i].num;
vec[j].count = c;
vec[j].ratio = ((double)c/s)*100;
j++;
c = 1;
}
}
count_sort(vec, l);
if (l >= 20){
cout << endl << "Showing the 10 most common numbers" << endl;
for (int i = 0; i < 10; i++){
cout << vec[i].num << "\t" << vec[i].count << "\t" << vec[i].ratio << "%" << endl;
}
cout << endl << "Showing the 10 least common numbers" << endl;
for (int i = l-10; i < l; i++){
cout << vec[i].num << "\t" << vec[i].count << "\t" << vec[i].ratio << "%" << endl;
}
} else {
for (int i = 0; i < l; i++){
cout << vec[i].num << "\t" << vec[i].count << "\t" << vec[i].ratio << "%" << endl;
}
}
}
Hoover我与$ ./rnd_test
How many numbers to generate? 10000
Generate 10000 numbers ranging from 1 to? 50
Use rand or mt19937? [1/2] 1
Showing the 10 most common numbers
17 230 2.3%
32 227 2.27%
26 225 2.25%
25 222 2.22%
3 221 2.21%
10 220 2.2%
35 218 2.18%
5 217 2.17%
13 215 2.15%
12 213 2.13%
Showing the 10 least common numbers
40 187 1.87%
7 186 1.86%
39 185 1.85%
42 184 1.84%
43 184 1.84%
34 182 1.82%
21 175 1.75%
22 175 1.75%
18 173 1.73%
44 164 1.64%
和mt19937
获得了相同的结果!这有什么不对?不应该是统一的,还是测试没用?
答案 0 :(得分:1)
不,它不应该是完全一致的。因此,上述内容并非任何错误的证据。
它们是随机的,因此应该相当统一,但不完全一致。
特别是你会期望每个数字出现大约10000/50 = 200次 - 大致标准偏差为sqrt(200),大约为14 - 对于50个数字,你可以预期大约2个标准偏差 - 是+ - / 28。
使用RAND_MAX模数引起的偏差小于此值;所以你需要更多的样本来检测偏见。
答案 1 :(得分:0)
您必须使用更多样本进行此类随机数测试。我用你的代码尝试了50000,结果是:
要生成多少个数字? 50000
生成500到1的数字? 50
使用rand还是mt19937? [1/2] 2
显示10个最常见的数字
36 1054 2.108%
14 1051 2.102%
11 1048 2.096%
27 1045 2.09%
2 1044 2.088%
33 1035 2.07%
21 1034 2.068%
48 1034 2.068%
34 1030 2.06%
39 1030 2.06%
显示10个最不常见的数字
47 966 1.932%
16 961 1.922%
38 960 1.92%
28 959 1.918%
8 958 1.916%
10 958 1.916%
30 958 1.916%
32 958 1.916%
18 953 1.906%
23 953 1.906%
答案 2 :(得分:-1)
据我所知 http://www.cplusplus.com/reference/random/mersenne_twister_engine/ mt19937将遭受与rand()
相同的偏见偏差是由于rand()在某个范围[0-MAX_RAND]中生成无符号整数,当你取模数时它会更小一些(除非你的除数是MAX_RAND的整数除数)
考虑:
Range [0-74]:
0 % 50 = 0
40 % 50 = 40
50 % 50 = 0
74 % 50 = 24
(numbers less than 25 occur twice)