Question

我需要计算两个函数重叠的区域。我在这个特定的简化示例中使用了正态分布，但我需要一个更适合其他函数的更通用的过程。

请参阅下面的图片，了解我的意思，红色区域就在我所追求的位置：

enter image description here

这是我到目前为止的MWE：

import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

# Generate random data uniformly distributed.
a = np.random.normal(1., 0.1, 1000)
b = np.random.normal(1., 0.1, 1000)

# Obtain KDE estimates foe each set of data.
xmin, xmax = -1., 2.
x_pts = np.mgrid[xmin:xmax:1000j]
# Kernels.
ker_a = stats.gaussian_kde(a)
ker_b = stats.gaussian_kde(b)
# KDEs for plotting.
kde_a = np.reshape(ker_a(x_pts).T, x_pts.shape)
kde_b = np.reshape(ker_b(x_pts).T, x_pts.shape)


# Random sample from a KDE distribution.
sample = ker_a.resample(size=1000)

# Compute the points below which to integrate.
iso = ker_b(sample)

# Filter the sample.
insample = ker_a(sample) < iso

# As per Monte Carlo, the integral is equivalent to the
# probability of drawing a point that gets through the
# filter.
integral = insample.sum() / float(insample.shape[0])

print integral

plt.xlim(0.4,1.9)
plt.plot(x_pts, kde_a)
plt.plot(x_pts, kde_b)

plt.show()

我申请Monte Carlo以获得积分。

这种方法的问题在于，当我使用ker_b(sample)（或ker_a(sample)）评估任一分布中的采样点时，我得到直接放在 KDE行上的值。因此，即使是明显重叠的分布，也应该返回非常接近1的公共/重叠区域值。返回小值（两条曲线的总面积为1.因为它们是概率密度估计值）。

如何修复此代码以获得预期结果？

这是我应用振亚答案的方式

# Calculate overlap between the two KDEs.
def y_pts(pt):
    y_pt = min(ker_a(pt), ker_b(pt))
    return y_pt
# Store overlap value.
overlap = quad(y_pts, -1., 2.)

Answer 1

地块上的红色区域是min(f(x), g(x))的积分，其中f和g是您的两个功能，绿色和蓝色。要评估积分，您可以使用scipy.integrate中的任何积分器（quad是默认的积分器，我会说） - 或者MC积分器，当然，但我不是非常明白这一点。

Answer 2

我认为另一种解决方案是将两条曲线相乘，然后取积分。您可能想要进行某种规范化。类比是化学中的轨道重叠：https://en.wikipedia.org/wiki/Orbital_overlap

计算两个函数的重叠区域

2 个答案: