Question

我有kaggle的45,253行数据集和底特律市开尔文市的温度单栏。平均值= 282.97，std = 11，最小值= 243.48，最大值= 308.05。

这是当绘制为密度为True的100个bin的直方图时的结果：

我希望编写以下两个函数，然后看看哪个函数最接近直方图：

使用scipy.stats.norm.pdf在这里类似于此：

我使用以下命令生成了以上图像：

x = np.linspace(dataset.Detroit.min(), dataset.Detroit.max(), 1001)
P_norm = norm.pdf(x, dataset.Detroit.mean(), dataset.Detroit.std())

plot_pdf_single(x, P_norm)

但是，每当我尝试实现两个逼近函数中的任何一个时，P_norm的所有值都会为0或infs。

这是我尝试过的：

P_norm = [(1.0/(np.sqrt(2.0*pi*(std*std))))*np.exp(((-x_i-mu)*(-x_i-mu))/(2.0*(std*std))) for x_i in x]

我也将其分解为一个x_i：

part1 = ((-x[0] - mu)*(-x[0] - mu)) / (2.0*(std * std))
part2 = np.exp(part1)
part3 = 1.0 / (np.sqrt(2.0 * pi * (std*std)))
total = part3*part2

我得到以下值：

1145.3913234604413
inf
0.036267480036493875
inf

Answer 1

由于两个方程使用相同的公式：

def pdf_approximation(x_i, mu, std):
    return (1.0 / (np.sqrt(2.0 * pi * (std*std)))) * np.exp((-(x_i-mu)*(x_i-mu)) / (2.0 * (std*std)))

第一个近似值的代码是：

mu = 283
std = 11

P_norm = np.array([pdf_approximation(x_i, mu, std) for x_i in x])

plot_pdf_single(x, P_norm)

第二种近似的代码是：

mu1 = 276
std1 = 6
mu2 = 293
std2 = 6.5

P_norm = np.array([(pdf_approximation(x_i, mu1, std1) * 0.5) + (pdf_approximation(x_i, mu2, std2) * 0.5) for x_i in x])

plot_pdf_single(x, P_norm)

手动实现逼近函数

1 个答案: