Question

我一直在跟随Ng教授的讲座，并尝试使用tensorflow在我的jupyter笔记本上实现SVM。但是，我的模型似乎没有正确收敛。

Scattered plot after 5000 steps of training

我想我的错误功能错了，这可能最终导致我的模型不合适。

以下是我的模型的完整图形构造代码：

tf.reset_default_graph()

#training hyper parameters

learning_rate = 0.000001
C = 20
gamma = 50

X = tf.placeholder(tf.float32, shape=(None,2))
Y = tf.placeholder(tf.float32, shape=(None,1))
landmark = tf.placeholder(tf.float32, shape=(None,2))

W = tf.Variable(np.random.random((num_data)),dtype=tf.float32)
B = tf.Variable(np.random.random((1)),dtype=tf.float32)

batch_size = tf.shape(X)[0]

#RBF Kernel
tile = tf.tile(X, (1,num_data))
diff = tf.reshape( tile, (-1, num_data, 2)) - landmark
tile_shape = tf.shape(diff)
sq_diff = tf.square(diff)
sq_dist = tf.reduce_sum(sq_diff, axis=2)
F = tf.exp(tf.negative(sq_dist * gamma))

WF = tf.reduce_sum(W * F, axis=1) + B

condition = tf.greater_equal(WF, 0)
H = tf.where(condition,  tf.ones_like(WF),tf.zeros_like(WF))

ERROR_LOSS = C * tf.reduce_sum(Y * tf.maximum(0.,1-WF) + (1-Y) * tf.maximum(0.,1+WF))
WEIGHT_LOSS = tf.reduce_sum(tf.square(W))/2

TOTAL_LOSS = ERROR_LOSS + WEIGHT_LOSS

optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train = optimizer.minimize(TOTAL_LOSS)

我使用高斯核并将整个训练集作为里程碑。

只要我有正确的实施，损失功能与演讲中显示的完全相同。

Loss function on the lecture

我很确定我错过了什么。

Answer 1

请注意，内核矩阵应包含batch_size^2个条目，而张量WF的形状为(batch_size, 2)。我们的想法是为数据集中的每对（x_i，x_j）计算K（x_i，x_j），然后使用这些内核值作为SVM的输入。

我在SVM上使用Andrew Ng's lecture notes作为参考;在第20页，他得出了最终的优化问题。您需要使用内核函数替换内积<x_i, x_j>。

我建议从线性内核而不是RBF开始，并将您的代码与开箱即用的SVM实现（如sklearn's）进行比较。这有助于确保优化代码正常运行。

最后一点：尽管应该可以使用梯度下降来训练SVM，但它们几乎从未在实践中以这种方式进行训练。 SVM优化问题可以通过二次规划来解决，大多数训练SVM的方法都可以利用这一点。

SVM Tensorflow实现

1 个答案: