我正在使用此代码,它适合自组织映射(SOM),然后聚合生成的原型向量以定义集群边界:
library(dplyr)
library(kohonen)
setwd('C:\\Users\\Bla\\Source\\Repos\\SomeExcitingRepo')
OrginalData <- read.table("IrisData.txt",
header = TRUE, sep = "\t")
SubsetData <- subset(OrginalData, select = c("SepalLength", "SepalWidth", "PetalLength", "PetalWidth"))
TrainingMatrix <- as.matrix(scale(SubsetData))
GridDefinition <- somgrid(xdim = 4, ydim = 4, topo = "hexagonal")
SomModel <- kohonen::supersom(data = TrainingMatrix, grid = GridDefinition, rlen = 1000, alpha = c(0.05, 0.01),
keep.data = TRUE)
groups = 3
iris.hc = cutree(hclust(dist(SomModel$codes[[1]])), groups)
plot(SomModel, type = "codes", bgcol = rainbow(groups)[iris.hc])
add.cluster.boundaries(SomModel, iris.hc)
数据是虹膜数据集,但这只是一个例子。数据集的格式如下:
Uid SepalLength SepalWidth PetalLength PetalWidth Species
1 5.1 3.5 1.4 0.2 setosa
现在让我们假设这是一个看不见的数据集。我想将其标准化并将其呈现给SOM,然后向每一行添加指示SOM群集编号的附加列(1,2,3见上例)和获胜节点的x和y坐标。例如:
Uid SepalLength SepalWidth PetalLength PetalWidth Species Cluster X Y
1 5.1 3.5 1.4 0.2 setosa 3 3 4
答案 0 :(得分:1)
您可以使用unit.classif
索引群集或网格点:
result <- OrginalData
result$Cluster <- iris.hc[SomModel$unit.classif]
result$X <- SomModel$grid$pts[SomModel$unit.classif,"x"]
result$Y <- SomModel$grid$pts[SomModel$unit.classif,"y"]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Cluster X Y
1 5.1 3.5 1.4 0.2 setosa 1 1.5 2.5980762
2 4.9 3.0 1.4 0.2 setosa 1 1.0 3.4641016
3 4.7 3.2 1.3 0.2 setosa 1 1.0 3.4641016
4 4.6 3.1 1.5 0.2 setosa 1 1.0 3.4641016
5 5.0 3.6 1.4 0.2 setosa 1 1.0 1.7320508
6 5.4 3.9 1.7 0.4 setosa 1 1.5 0.8660254
但它看起来并不那么好:
points(jitter(result$X), jitter(result$Y), col=result$Species)
legend(5,0, legend=unique(result$Species), col=unique(result$Species), pch=1)