mllib KMeans显示随机行为

时间:2017-03-30 07:56:45

标签: scala apache-spark cluster-analysis k-means apache-spark-mllib

我正在使用Scala Spark版本1.6.1的KMeans并观察随机行为。

根据我的理解,唯一的随机部分是初始中心初始化,我提到了。

实验如下:我运行KMeans一次并得到模型 - 第一次随机初始化中心。获得模型后,我运行以下代码:

//val latestModel: KMeansModel was trained earlier

val km:KMeans = new KMeans()
km.setK(numberOfClusters)
km.setMaxIterations(20)
if(previousModel != null)
{
  if(latestModel.k == numberOfClusters)
  {
    logger.info("Using cluster centers from previous model")
    km.setInitialModel(latestModel) //Set initial cluster centers
  }

}

kmeansModel = KMeans.train(dataAfterPCA, numberOfClusters, 20)
println("Run#1")
kmeansModel.clusterCenters.foreach(t => println(t))
kmeansModel = KMeans.train(dataAfterPCA, numberOfClusters, 20)
println("Run#2")
kmeansModel.clusterCenters.foreach(t => println(t))

如您所见,我使用latestModel的中心并观察了打印。

集群中心不同:

Run#1
[0.20910608631141306,0.2008812839967183,0.27863526709646663,0.17173268189352492,0.4068108508134425,1.5978368739711135,-0.03644171546864227,-0.034547377483902755,-0.30757069112989693,-0.04681453873202328,-0.03432819320158391,-0.0229510885384198,0.16155254061277455]
[-0.9986167379861676,-0.4228356715735266,-0.9797043073290139,-0.48157892793353135,-0.7818198908298358,-0.3991524190947045,-0.09142025949212684,-0.034547396992719734,-0.4149601436468508,-0.04681453873202326,56.38182990388363,-0.027308795774228338,-0.8567167533956337]
[0.40443230723847245,0.40753014996762926,0.48063940183378684,0.37038231765864527,0.615561235153811,-0.1546334408565992,1.1517155044090817,-0.034547396992719734,0.17947924999402878,22.44497403279252,-0.04625456310989393,-0.027308795774228335,0.3521192856019467]
[0.44614142085922764,0.39183992738993073,0.5599716298711428,0.31737580128115594,0.8674951428275776,0.799192261554564,1.090005738447001,-0.034547396992719734,-0.10481785131247881,-0.04681453873202326,-0.04625456310989393,41.936484571639795,0.4864344010461224]
[0.3506753428299332,0.3395786568210998,0.45443729624612045,0.3115089688709545,0.4762387976829325,11.3438592782776,0.04041394221229458,-0.03454735647587367,1.0065342405811888,-0.046814538732023264,-0.04625456310989393,-0.02730879577422834,0.19094114706893608]
[0.8238890515931856,0.8366348841253755,0.9862410283176735,0.7635549199270218,1.1877685458769478,0.7813626284105487,38.470668704621396,-0.03452467509554947,-0.4149294724823659,-0.04681453873202326,1.2214455451195836,-0.0212002267696445,1.1580099782670004]
[0.21425069771110813,0.22469514482272127,0.30113774986108593,0.182605001533264,0.4637631333393578,0.029033109984974183,-0.002029301682406235,-0.03454739699271971,2.397309416381941,0.011941957462594896,-0.046254563109893905,-0.018931196565979497,0.35297479589140024]
[-0.6546798328639079,-0.6358370654999287,-0.7928424675098332,-0.5071485895971765,-0.7400917528763642,-0.39717704681705857,-0.08938412993092051,-0.02346229974103403,-0.40690957159820434,-0.04681453873202331,-0.023692354206657835,-0.024758557139368385,-0.6068025631839297]
[-0.010895214450242299,-0.023949109470308646,-0.07602949287623037,-0.018356772906618683,-0.39876455727035937,-0.21260655806916112,-0.07991736890951397,-0.03454278343886248,-0.3644711133467814,-0.04681453873202319,-0.03250578362850749,-0.024761896110663685,-0.09605183996736125]
[0.14061295519424166,0.14152409771288327,0.1988841951819923,0.10943684592384875,0.3404665467004296,-0.06397788416055701,0.030711112793548753,0.044173951636969355,-0.08950950493941498,-0.039099833378049946,-0.03265898863536165,-0.02406954910363843,0.16029254891067157]
Run#2
[0.11726347529467256,0.11240236056044385,0.145845029386598,0.09061870140058333,0.15437020046635777,0.03499211466800115,-0.007112193875767524,-0.03449302405046689,-0.20652827212743696,-0.041880871009984943,-0.042927843040582066,-0.024409659630584803,0.10595250123068904]
[-0.9986167379861676,-0.4228356715735266,-0.9797043073290139,-0.48157892793353135,-0.7818198908298358,-0.3991524190947045,-0.09142025949212684,-0.034547396992719734,-0.4149601436468508,-0.04681453873202326,56.38182990388363,-0.027308795774228338,-0.8567167533956337]
[0.40443230723847245,0.40753014996762926,0.48063940183378684,0.37038231765864527,0.615561235153811,-0.1546334408565992,1.1517155044090817,-0.034547396992719734,0.17947924999402878,22.44497403279252,-0.04625456310989393,-0.027308795774228335,0.3521192856019467]
[0.44614142085922764,0.39183992738993073,0.5599716298711428,0.31737580128115594,0.8674951428275776,0.799192261554564,1.090005738447001,-0.034547396992719734,-0.10481785131247881,-0.04681453873202326,-0.04625456310989393,41.936484571639795,0.4864344010461224]
[0.056657434641233205,0.03626919750209713,0.1229690343482326,0.015190756508711958,-0.278078039715814,-0.3991255672375599,0.06613236052364684,28.98230095429352,-0.4149601436468508,-0.04681453873202326,-0.04625456310989393,-0.027308795774228338,-0.31945629161893124]
[0.8238890515931856,0.8366348841253755,0.9862410283176735,0.7635549199270218,1.1877685458769478,0.7813626284105487,38.470668704621396,-0.03452467509554947,-0.4149294724823659,-0.04681453873202326,1.2214455451195836,-0.0212002267696445,1.1580099782670004]
[-0.17971932675588306,-7.925508727413683E-4,-0.08990036350145142,-0.033456211225756705,-0.1514393713761394,-0.08538399305051374,-0.09132371177664707,-0.034547396992719734,-0.19858350916572132,-0.04681453873202326,4.873470425033645,-0.023394262810850164,0.15064661243568334]
[-0.4488579509785471,-0.4428314704219248,-0.5776049270843375,-0.3580559344350086,-0.6787807800457122,-0.378841125619109,-0.08742047856626034,-0.027746008987067004,-0.3951588549839565,-0.046814538732023264,-0.04625456310989399,-0.02448638761790114,-0.4757072927512256]
[0.2986301685357443,0.2895405124404614,0.39435230210861016,0.2549716029318805,0.5238783183359862,5.629286423487358,0.012002410566794644,-0.03454737293733725,0.1657346440290886,-0.046814538732023264,-0.03653898382838679,-0.025149508122450703,0.2715302163354414]
[0.2072253546037051,0.21958064267615496,0.29431697644435456,0.17741927849917147,0.4521349932664591,-0.010031680919536882,3.9433761322307554E-4,-0.03454739699271971,2.240412962951767,0.005598926623403161,-0.046254563109893905,-0.018412129948368845,0.33990882056156724]

我试图了解这种随机行为的来源是什么,如何避免,无法在Git来源找到任何内容。

任何想法/建议?我必须有稳定的行为。

1 个答案:

答案 0 :(得分:0)

这是正常的。每次训练模型时,它会随机初始化参数。如果你设置的迭代次数足够大,它会聚合在一起。

你应该使用km.train()而不是KMeans.train()