scikit-learn GaussianMixture在初始化后立即终止

时间:2019-02-26 17:50:22

标签: scikit-learn

我使用GaussianMixture对特征向量(8维)进行聚类。我将GaussianMixture设置如下:

gm = GaussianMixture(n_components=class_no,
                     tol=1e-6,
                     covariance_type="spherical",
                     init_params="random",
                     verbose=2, verbose_interval=1)

我大约有1000万个样本矢量,并且class_no是100。GaussianMixture拟合然后在初始化后过早终止。

如果我将样本数量减少到1-2百万,则拟合正常。

可能的原因是什么?

此外,我还看到在第一次迭代中(会更改inf )(当它不会过早终止时)。正常吗?

1 个答案:

答案 0 :(得分:0)

经过更多的实验,我也许已经发现了问题。如果我使用1M个样本运行GM,则底部显示前30个迭代的ll变化。通常的趋势是ll变化开始时很小,然后逐渐增加到一个峰值,然后收敛到零。随着样本数量的增加,前几步的变化将越来越小。到我使用1000万个样本时,迭代2中的ll变化已经小于我的阈值1e-6。如果将其减小为1e-7,则将像使用较小的样本量一样完成。

看起来我们应该以某种方式将ll变化归一化吗?

Initialization 0
  Iteration 1    time lapse 10.15743s    ll change inf
  Iteration 2    time lapse 7.76352s     ll change 0.00001
  Iteration 3    time lapse 7.83711s     ll change 0.00006
  Iteration 4    time lapse 7.83133s     ll change 0.00044
  Iteration 5    time lapse 8.18798s     ll change 0.00317
  Iteration 6    time lapse 7.78111s     ll change 0.02268
  Iteration 7    time lapse 7.95682s     ll change 0.13413
  Iteration 8    time lapse 7.87189s     ll change 0.38677
  Iteration 9    time lapse 7.75204s     ll change 0.53651
  Iteration 10   time lapse 7.73964s     ll change 0.46236
  Iteration 11   time lapse 7.75558s     ll change 0.55855
  Iteration 12   time lapse 7.75457s     ll change 0.57340
  Iteration 13   time lapse 7.77386s     ll change 0.21811
  Iteration 14   time lapse 7.75011s     ll change 0.09917
  Iteration 15   time lapse 7.78765s     ll change 0.06162
  Iteration 16   time lapse 7.81858s     ll change 0.04783
  Iteration 17   time lapse 7.76057s     ll change 0.04079
  Iteration 18   time lapse 7.73551s     ll change 0.03687
  Iteration 19   time lapse 7.82454s     ll change 0.03416
  Iteration 20   time lapse 7.78091s     ll change 0.02830
  Iteration 21   time lapse 7.78215s     ll change 0.02189
  Iteration 22   time lapse 7.75392s     ll change 0.01775
  Iteration 23   time lapse 7.77399s     ll change 0.01523
  Iteration 24   time lapse 7.73693s     ll change 0.01342
  Iteration 25   time lapse 7.74950s     ll change 0.01205
  Iteration 26   time lapse 7.73767s     ll change 0.01107
  Iteration 27   time lapse 7.79130s     ll change 0.01021
  Iteration 28   time lapse 7.76402s     ll change 0.00916
  Iteration 29   time lapse 7.77638s     ll change 0.00799
  Iteration 30   time lapse 7.76722s     ll change 0.00695