Question

我想使用SGDClassifier提供的一些正则化方法，但似乎无法正确学习。

我的实际数据是在具有多个S形曲线的高维空间中，我想预测一些保持不变的点。但是，我从SGDClassifier获得的结果似乎有很多随机性，甚至不能很好地拟合训练数据。

下面是一些最小的代码，它们显示了我遇到的一些问题。我在一个维度上有很多点，并对这些点的标签取平均值，发现它非常呈S形。我想沿轴平均加权每个点，因此我将每个点的总权重标准化为1，在每个标签之间分配。

B = .1
M = 64
x = np.arange(128)
y = 1. / (1. + np.exp(-B*(x-M)))

train_x = np.concatenate([x, x]).reshape(-1, 1)
train_y = np.repeat([0, 1], len(x))
train_weights = np.concatenate([1.0 - y, y])

model = skl.linear_model.SGDClassifier(loss='log').fit(train_x, train_y, sample_weight=train_weights)
predicted_values = model.predict_proba(x.reshape(-1, 1))[:,1]

model = skl.linear_model.SGDClassifier(loss='log').fit(train_x, train_y, sample_weight=train_weights)
predicted_values2 = model.predict_proba(x.reshape(-1, 1))[:,1]

model = skl.linear_model.SGDClassifier(loss='log').fit(train_x, train_y, sample_weight=train_weights)
predicted_values3 = model.predict_proba(x.reshape(-1, 1))[:,1]

plt.plot(train_x, train_weights, label='weights')
plt.plot(x, y, label='training')
plt.plot(x, predicted_values, label='predicted')
plt.plot(x, predicted_values2, label='predicted2')
plt.plot(x, predicted_values3, label='predicted3')
plt.legend()

SGDClassifier有时似乎学习阶跃函数，或者根本不学习。

skl SGDClassifier失败，无法进行加权加权逻辑回归

0 个答案: