我使用GaussianMixture对特征向量(8维)进行聚类。我将GaussianMixture设置如下:
gm = GaussianMixture(n_components=class_no,
tol=1e-6,
covariance_type="spherical",
init_params="random",
verbose=2, verbose_interval=1)
我大约有1000万个样本矢量,并且class_no是100。GaussianMixture拟合然后在初始化后过早终止。
如果我将样本数量减少到1-2百万,则拟合正常。
可能的原因是什么?
此外,我还看到在第一次迭代中(会更改inf )(当它不会过早终止时)。正常吗?
答案 0 :(得分:0)
经过更多的实验,我也许已经发现了问题。如果我使用1M个样本运行GM,则底部显示前30个迭代的ll变化。通常的趋势是ll变化开始时很小,然后逐渐增加到一个峰值,然后收敛到零。随着样本数量的增加,前几步的变化将越来越小。到我使用1000万个样本时,迭代2中的ll变化已经小于我的阈值1e-6。如果将其减小为1e-7,则将像使用较小的样本量一样完成。
看起来我们应该以某种方式将ll变化归一化吗?
Initialization 0
Iteration 1 time lapse 10.15743s ll change inf
Iteration 2 time lapse 7.76352s ll change 0.00001
Iteration 3 time lapse 7.83711s ll change 0.00006
Iteration 4 time lapse 7.83133s ll change 0.00044
Iteration 5 time lapse 8.18798s ll change 0.00317
Iteration 6 time lapse 7.78111s ll change 0.02268
Iteration 7 time lapse 7.95682s ll change 0.13413
Iteration 8 time lapse 7.87189s ll change 0.38677
Iteration 9 time lapse 7.75204s ll change 0.53651
Iteration 10 time lapse 7.73964s ll change 0.46236
Iteration 11 time lapse 7.75558s ll change 0.55855
Iteration 12 time lapse 7.75457s ll change 0.57340
Iteration 13 time lapse 7.77386s ll change 0.21811
Iteration 14 time lapse 7.75011s ll change 0.09917
Iteration 15 time lapse 7.78765s ll change 0.06162
Iteration 16 time lapse 7.81858s ll change 0.04783
Iteration 17 time lapse 7.76057s ll change 0.04079
Iteration 18 time lapse 7.73551s ll change 0.03687
Iteration 19 time lapse 7.82454s ll change 0.03416
Iteration 20 time lapse 7.78091s ll change 0.02830
Iteration 21 time lapse 7.78215s ll change 0.02189
Iteration 22 time lapse 7.75392s ll change 0.01775
Iteration 23 time lapse 7.77399s ll change 0.01523
Iteration 24 time lapse 7.73693s ll change 0.01342
Iteration 25 time lapse 7.74950s ll change 0.01205
Iteration 26 time lapse 7.73767s ll change 0.01107
Iteration 27 time lapse 7.79130s ll change 0.01021
Iteration 28 time lapse 7.76402s ll change 0.00916
Iteration 29 time lapse 7.77638s ll change 0.00799
Iteration 30 time lapse 7.76722s ll change 0.00695