Question

我正在使用python来使用拒绝接受方法来对离散MC分布进行采样。由于曲线类似于power law，我决定在它周围设置一个简单的包络（在x = 77）以使代码更快。但是，代码没有按预期执行，因为与包络相比，整个区域上的图形显示为简单的矩形：here

x是data.rank，介于0-5000之间 y是data.freq

有人能发现代码有什么问题吗？两个直方图的输出应该相等。谢谢！

#!/usr/bin/env python                        

import numpy as np
import matplotlib.pyplot as plt
import random
import pandas as pd

# Read data
data = pd.read_csv('data.csv')

# Rejection-sampling MC (with envelope function gx at rank=77)
N = 1000
M = 1.0001
cutoff = 77
gx = np.ones(cutoff) * max(data['freq'])
gx = np.append( gx, np.ones(len(data['rank'])-cutoff) * data['freq'][cutoff-1] )
histx = []
while N > 0: 
    rx = random.randint(0,len(gx)-1)
    ry = random.uniform(0,1)
    if ry < data['freq'][rx]/(M*gx[rx]):
        histx.append(rx)
        N += -1
plt.hist(histx, bins=100, histtype='stepfilled', color='b',label='Enveloped (Fast)')

# Rejection-sampling MC (with envelope function gx at rank=77)
N2 = 1000
histy = []
while N2 > 0: 
    rx2 = random.randint(0,len(gx)-1)
    ry2 = random.uniform(0,max(data['freq']))
    if ry2 < data['freq'][rx2]:
        histy.append(rx2)
        N2 += -1
plt.hist(histy, bins=100, histtype='stepfilled', color='r', alpha=0.5, label='Normal (Slow)')
plt.legend()
plt.show()

Answer 1

这些是不同的样本

在第二种情况下，你实际上有

ry2 = random.uniform(0.0, 1.0)
if ry2 < data['freq'][rx2]/max(data['freq']):
    accept

在第一种情况下，你有

ry = random.uniform(0.0, 1.0)
if ry < data['freq'][rx]/gx[rx]:
    accept

如果gx[rx]不等于max(data['freq'])，则会有差异

一些建议：尝试始终使用U（0,1）rng调用，更容易发现，替换或更改rng。其次，在MC性能（以及正确性之后）至关重要的情况下，尝试在主循环之外计算max(data['freq'])或len(gx)之类的内容

使用Python离散拒绝采样蒙特卡罗

1 个答案: