Lower bound for multivariate kernel density estimation in python

时间:2018-02-15 12:31:59

标签: python scipy scikit-learn seaborn kernel-density

I have a two dimensional data that I want to estimate its joint distribution using kernel density estimation in python. the only problem that I am facing is how to incorporate a lower bound in kernel density estimation in python ( I tried all possibilities (scipy.stat, sklearn.neighbors)). For visualization, seaborn solves the problem by including xlim and ylim. However, I need later to resample from the estimated distribution so the truncation is critical, I need that the sampled values in the y axis are positive. Any hint for that because I did not find an option in python so far? Thank you.

Below I put my code and show some of my outputs: the first one corresponds to seaborn while the second corresponds to gaussian_kde in Scipy.

fig1

fig2

x = sub_data['x']
y = sub_data['y']
xmin, xmax = 90, 450
ymin, ymax = 0, 2
#[x,y] is the data
# Peform the kernel density estimate using scipy
xx, yy = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
positions = np.vstack([xx.ravel(), yy.ravel()])
values = np.vstack([x, y])
kernel = st.gaussian_kde(values)
new_samples_spy=kernel.resample(10)
f = np.reshape(kernel(positions).T, xx.shape)
fig = plt.figure()
ax = fig.gca()
ax.set_xlim(xmin, xmax)
ax.set_ylim(ymin, ymax)
# Contourf plot
cfset = ax.contourf(xx, yy, f, cmap='Blues')
# Contour plot
cset = ax.contour(xx, yy, f, colors='k')

# Label plot
ax.clabel(cset, inline=1, fontsize=10)
ax.set_xlabel('x',fontsize=14)
ax.set_ylabel('y', fontsize=14)
plt.legend(loc='upper right')
plt.show()
# #  2nd way using sklearn
from sklearn.neighbors import KernelDensity
def kde2D(x, y, bandwidth, xbins=100j, ybins=100j, **kwargs): 
"""Build 2D kernel density estimate (KDE)."""
    #     # create grid of sample locations (default: 100x100)
    xx, yy = np.mgrid[x.min():x.max():xbins, y.min():y.max():ybins]
    xy_sample = np.vstack([yy.ravel(), xx.ravel()]).T
    xy_train  = np.vstack([y, x]).T
    kde_skl = KernelDensity(bandwidth=bandwidth, **kwargs)
    kde_skl.fit(xy_train)
    # score_samples() returns the log-likelihood of the samples
    z = np.exp(kde_skl.score_samples(xy_sample))
    return xx, yy, np.reshape(z, xx.shape)


fig = plt.figure(2)
ax = fig.gca()
ax.set_xlim(xmin, xmax)
ax.set_ylim(ymin, ymax)
xx, yy, zz = kde2D(x, y, 1.0)

# Contourf plot
cfset = ax.contourf(xx, yy, f, cmap='Blues')
# Contour plot
cset = ax.contour(xx, yy, f, colors='k')
# Label plot
ax.clabel(cset, inline=1, fontsize=10)
ax.set_xlabel('x',fontsize=14)
ax.set_ylabel('y', fontsize=14)
plt.legend(loc='upper right')
plt.show()

# third way
import seaborn as sns
fig = plt.figure(3)
sn=sns.jointplot(x="x", y="y", data=sub_data,kind="kde",xlim=(0,500), ylim=
(0,1.2));
ax.set_xlabel('x',fontsize=14)
ax.set_ylabel('y', fontsize=14)
plt.show()

0 个答案:

没有答案