Question

我正在尝试增加数据集的记录数量，以便制作半监督学习算法。起始数据集有大约495条记录，包含9个特征和2个目标。

到目前为止，我已经绘制了功能的分布图，以便快速了解可能的分布情况。此外，我试图在参数和非参数方法中都适合它们。

问题是，如何获得有关误差估计的数值结果？

编辑：感谢 juanpa.arrivillaga ，我添加了一个问题摘要：如何获得数值结果（例如MSE）以了解密度估计相对于实际密度估计的拟合质量？使用粘贴的代码，我只能根据情节获得一个想法。

下面是我当前的功能，我已经在堆栈溢出时查找了它，但我只能最终进入绘图，以及我使用过的函数。在此先感谢您的帮助！

def learnTheGaussianEstimation(title, x_data, y_data, mean, variance,parametric_kde):
    """
    Function that given the curve try to fit it with the Gaussian. The curve represents a single parameter behaviour over
all the monitored Dates.
    :param title: string to be assigned upon the plot
    :param x: list of x-points
    :param y: list of y-points
    :param mean: mean of the values
    :param variance: variance of the values
    :param parametric_kde: boolean that denotes hte mode if parametric or non parametric fit
    :return: void
    """
    if parametric_kde:
        n_bins = len(y_data)/7

        # Density given by the samples
        pd.DataFrame(y_data).plot(kind="density",
                                    figsize=(9, 9),title=title,label='data')

        # Gaussian fitting  -   parametric
        param = norm.fit(y_data,param=[mean,])
        x = np.linspace(min(y_data), max(y_data), len(x_data))
        norm_fitted = norm.pdf(x, loc=param[0], scale=param[1])

        plt.plot(x,norm_fitted, 'r' , label='gaussian')

        # Rayleight fitting -   parametric
        param = rayleigh.fit(y_data)
        rayleigh_fitted = rayleigh.pdf(x, loc=param[0], scale=param[1])
        plt.plot(x,rayleigh_fitted, 'g',label='rayleigh')

        plt.legend()
        plt.show()
    else:
        q1 = np.percentile(y_data, 25)
        q3 = np.percentile(y_data, 75)
        bandwidth = 0.25
        bins = 6

        x = np.linspace(min(y_data), max(y_data), 2000)

        kde = gaussian_kde(y_data)
        kde.covariance_factor = lambda: bandwidth
        kde._compute_covariance()

        plt.plot(x, kde(x), 'r')  # distribution function
        plt.hist(y_data, bins=bins, normed=True)  # histogram

        plt.show()
    return

核密度估计python - 评估

0 个答案: