了解scipy.stats.multivariate_normal的输出

时间:2018-09-27 11:42:09

标签: python scipy

我正在尝试使用scipy.stats.multivariate_normal建立多维高斯模型。我正在尝试使用scipy.stats.multivariate_normal.pdf()的输出来确定测试值是否在观察到的分布中合理地合适。

据我了解,较高的值表示更适合给定的模型,否则较低的值。

但是,在我的数据集中,我看到了很大的PDF(x)结果,这使我怀疑我是否正确理解了事情。 PDF曲线下的面积必须为1,因此很难理解很大的值。

例如,考虑:

x = [-0.0007569417915494715, -0.01394295997613827, 0.000982078369890444, -0.03633664354397629, -0.03730583036106844, 0.013920453054506978, -0.08115836865224338, -0.07208494497398354, -0.06255237023298793, -0.0531888840386906, -0.006823760545565131]

mean = [0.01663645201261102, 0.07800335614699873, 0.016291452384234965, 0.012042931155488702, 0.0042637244100103885, 0.016531331606477996, -0.021702714746699842, -0.05738646649459681, 0.00921296058625439, 0.027940994009345254, 0.07548111758006244]

covariance = [[0.07921927017771506, 0.04780185747873293, 0.0788086850274493, 0.054129466248481264, 0.018799028456661045, 0.07523731808137141, 0.027682748950487425, -0.007296954729572955, 0.07935165417756569, 0.0569381100965656, 0.04185848489472492], [0.04780185747873293, 0.052300105044833595, 0.047749467098423544, 0.03254872837949123, 0.010582358713999951, 0.045792252383799206, 0.01969282984717051, -0.006089301208961258, 0.05067712814145293, 0.03146214776997301, 0.04452949330387575], [0.0788086850274493, 0.047749467098423544, 0.07841809405745602, 0.05374461924031552, 0.01871005609017673, 0.07487015790787396, 0.02756781074862818, -0.007327131572569985, 0.07895548129950304, 0.056417456686115544, 0.04181063355048408], [0.054129466248481264, 0.03254872837949123, 0.05374461924031552, 0.04538801863296238, 0.015795381235224913, 0.05055944754764062, 0.02017033995851422, -0.006505939129684573, 0.05497361331950649, 0.043858860182247515, 0.029356699144606032], [0.018799028456661045, 0.010582358713999951, 0.01871005609017673, 0.015795381235224913, 0.016260640022897347, 0.015459548918222347, 0.0064542528152879705, -0.0016656858963383602, 0.018761682220822192, 0.015361512546799405, 0.009832025009280924], [0.07523731808137141, 0.045792252383799206, 0.07487015790787396, 0.05055944754764062, 0.015459548918222347, 0.07207012779105286, 0.026330967917717253, -0.006907504360835279, 0.0753380831201204, 0.05335128471397023, 0.03998397595850863], [0.027682748950487425, 0.01969282984717051, 0.02756781074862818, 0.02017033995851422, 0.0064542528152879705, 0.026330967917717253, 0.020837940236441078, -0.003320408544812026, 0.027859582829638897, 0.01967636950969646, 0.017105000942890598], [-0.007296954729572955, -0.006089301208961258, -0.007327131572569985, -0.006505939129684573, -0.0016656858963383602, -0.006907504360835279, -0.003320408544812026, 0.024529061074105817, -0.007869287828047853, -0.006228903058681195, -0.0058974553248417995], [0.07935165417756569, 0.05067712814145293, 0.07895548129950304, 0.05497361331950649, 0.018761682220822192, 0.0753380831201204, 0.027859582829638897, -0.007869287828047853, 0.08169291677188911, 0.05731196406065222, 0.04450058445993234], [0.0569381100965656, 0.03146214776997301, 0.056417456686115544, 0.043858860182247515, 0.015361512546799405, 0.05335128471397023, 0.01967636950969646, -0.006228903058681195, 0.05731196406065222, 0.05064023101024737, 0.02830810316675855], [0.04185848489472492, 0.04452949330387575, 0.04181063355048408, 0.029356699144606032, 0.009832025009280924, 0.03998397595850863, 0.017105000942890598, -0.0058974553248417995, 0.04450058445993234, 0.02830810316675855, 0.040658283674780395]]

为此,如果我计算y = multivariate_normal.pdf(x, mean, cov); 结果是342562705.3859754

怎么可能呢?我想念什么吗?

谢谢。

1 个答案:

答案 0 :(得分:0)

很好。 probability density function在特定点可以大于1。它是必须等于1的整数。

pdf < 1对离散变量正确的想法。但是,对于连续的那些,pdf 不是。这是一个概率综合的值。也就是说,在所有维度上,从负无穷大到无穷大的积分等于1