t-SNE的负KL分歧

时间:2016-06-30 17:25:40

标签: python data-visualization

我正在使用t-SNE,其数据集为1262个数据点,维数为3288.我已经多次运行t-SNE并监控KL散度值以选择具有最低KL值的值,如Laurens van der Maaten FAQ建议。

然而,我最终得到了负面的KL分歧,我不明白,因为在我看来,根据Gibbs'不等式。

model = TSNE(n_components=2, verbose=2, perplexity =30, init='pca', learning_rate=1000)
Projection = model.fit_transform(PartsQTYperCar)

[t-SNE] Computing pairwise distances...
[t-SNE] Computing 91 nearest neighbors...
[t-SNE] Computed conditional probabilities for sample 1000 / 1262
[t-SNE] Computed conditional probabilities for sample 1262 / 1262
[t-SNE] Mean sigma: 0.000000
[t-SNE] Iteration 25: error = 0.0827444, gradient norm = 0.0075889
[t-SNE] Iteration 50: error = 0.0747056, gradient norm = 0.0062666
[t-SNE] Iteration 75: error = 0.0618893, gradient norm = 0.0047267
[t-SNE] Iteration 100: error = 0.0302985, gradient norm = 0.0051550
[t-SNE] Error after 100 iterations with early exaggeration: 0.030298
[t-SNE] Iteration 125: error = -0.0109731, gradient norm = 0.0038695
[t-SNE] Iteration 150: error = -0.0086051, gradient norm = 0.0035785
[t-SNE] Iteration 175: error = -0.0211171, gradient norm = 0.0039716
[t-SNE] Iteration 200: error = 0.0006259, gradient norm = 0.0032799
[t-SNE] Iteration 225: error = -0.0061129, gradient norm = 0.0032859
[t-SNE] Iteration 225: did not make any progress during the last 30 episodes. Finished.
[t-SNE] Error after 225 iterations: -0.006113

1 个答案:

答案 0 :(得分:0)

{{1}}

正如@ vbvx 所指出的,删除重复的数据点解决了这个问题。 虽然@ vbvx 提到重复项违反了上述KL分歧的第二个条件,但 i 的重复值只应更改 i 的频率和因此 P(i) Q(i)

但是,如果将副本视为不同的项并存在于其中一个发行版 P Q 中,则 P 的支持>将不同于 Q 的支持,第二个条件不会成功。