应用错误收集

回归没有概率，在回归中，您将获得的唯一输出是预测值，这就是为什么将其称为回归的原因，因此对于任何回归而言，预测的可能性都是不可能的。它仅存在于分类中。

如前所述，没有回归的可能性。

但是，您可以在该回归上添加一个confidence interval，以查看您的回归是否值得信任。

但是要注意的一件事是，沿数据的方差可能并不相同。假设您研究一个基于时间的现象。具体来说，您需要在烤箱内（x）时间（例如秒）后的温度（y）。在x = 0s时，温度为20°C，您开始对其进行加热，并想知道其变化，以便预测x秒后的温度。 20秒后和5分钟后，方差可以相同，也可以完全不同。这称为heteroscedasticity。

如果要使用置信区间，则可能要确保注意异方差性，因此所有数据的区间都相同。

您可能可以尝试获取已知输出的分布，并比较该曲线上的预测，然后检查p值。但这只会给您一个量度，即不考虑输入而获得输出的现实程度。如果您知道输入/输出处于特定的时间间隔内，则可能会起作用。

编辑这就是我要做的。显然，输出是您的实际输出。 import numpy as np import matplotlib.pyplot as plt from scipy import integrate from scipy.interpolate import interp1d N = 1000 # The number of sample mean = 0 std = 1 outputs = np.random.normal(loc=mean, scale=std, size=N) # We want to get a normed histogram (since this is PDF, if we integrate # it must be equal to 1) nbins = N / 10 n = int(N / nbins) p, x = np.histogram(outputs, bins=n, normed=True) plt.hist(outputs, bins=n, normed=True) x = x[:-1] + (x[ 1] - x[0])/2 # converting bin edges to centers # Now we want to interpolate : # f = CubicSpline(x=x, y=p, bc_type='not-a-knot') f = interp1d(x=x, y=p, kind='quadratic', fill_value='extrapolate') x = np.linspace(-2.9*std, 2.9*std, 10000) plt.plot(x, f(x)) plt.show() # To check : area = integrate.quad(f, x[0], x[-1]) print(area) # (should be close to 1)

现在，插值方法不适用于离群值。如果预测数据与您的分布相距甚远（超过标准差的3倍），则它将无法正常工作。除此之外，您现在可以使用PDF获得有意义的结果。

这不是完美的，但这是我当时想出的最好的。我敢肯定，有更好的方法可以做到这一点。如果您的数据遵循正常的规律，那么它就变得微不足道了。

有没有办法使用XGBoostRegressor获得预测的可能性？

2 个答案: