R中的自组织映射生成一个大集群和几个小集群

时间:2017-07-16 15:00:37

标签: r machine-learning hierarchical-clustering

我目前正在为一家慈善机构做一些工作。我使用自组织映射来聚集R中的捐赠者。这是我目前使用的R代码:

library(dplyr)
library(kohonen)

setwd('d:\\Bla')

OrginalData <- read.table("InputForSom1.txt",
                   header = TRUE, sep = "\t")

SubsetData <- subset(OrginalData, select = c(
"Frequency2013" 
,"Sum2013"  
,"Frequency2014"    
,"Sum2014"  
,"Frequency2015"    
,"Sum2015"  
,"Frequency2016"    
,"Sum2016"  
,"Frequency2017"    
,"Sum2017"
#,"Easting"
#,"Northing"
))
TrainingMatrix <- as.matrix(scale(SubsetData))
#TrainingMatrix <- as.matrix(SubsetData)

GridDefinition <- somgrid(xdim = 10, ydim = 10, topo = "rectangular")

SomModel <- kohonen::supersom(data = TrainingMatrix, grid = GridDefinition, rlen = 1000, alpha = c(0.05, 0.001),
             keep.data = TRUE)
groups = 3
tree.hc = cutree(hclust(dist(SomModel$codes[[1]])), groups)

plot(SomModel, type = "codes", bgcol = rainbow(groups)[tree.hc])
add.cluster.boundaries(SomModel, tree.hc)

result <- OrginalData
result$Cluster <- tree.hc[SomModel$unit.classif]
result$X <- SomModel$grid$pts[SomModel$unit.classif,"x"]
result$Y <- SomModel$grid$pts[SomModel$unit.classif,"y"]

write.table(result, file = "SomOutput.csv", sep = ",", col.names = NA,
            qmethod = "double")

对于每个捐赠者,我知道他一年捐赠的频率和每年的总金额。请注意,我还可以生成更细粒度的数据(即每月捐款和每月总计)。我也知道捐赠者在英国东北部的空间位置(参见上面的子集声明)。 我遇到的问题是代码的'tree.hc部分'产生一个大型集群(几乎包含所有捐赠者)和几个非常小的集群。有没有办法获得更均匀分布的集群?

0 个答案:

没有答案