按组聚类

时间:2017-05-23 08:48:30

标签: r cluster-analysis

如何按群组执行群集?例如,在Kaggle上获取this Pokemon数据集。

此数据集的示例如下所示(更改了一些字段以模仿我的数据):

Name                        Type I  Type II
Bulbasaur                   Grass   Poison  
Bulbasaur 2                 Grass   Poison  
Venusaur                    Grass   Not Null
VenusaurMega Venusaur       Grass   Not Null
...
Charizard                   Fire    Flying
CharizardMega Charizard X   Fire    Dragon

假设我的数据集中没有空值,我如何分别按类型I和类型II列进行分组,然后按名称之间的相似性进行聚类?

输出应如下:

Name                        Type I  Type II  Cluster
Bulbasaur                   Grass   Poison   1
Bulbasaur 2                 Grass   Poison   1
Venusaur                    Grass   Not Null 2
VenusaurMega Venusaur       Grass   Not Null 2
...
Charizard                   Fire    Flying   3
CharizardMega Charizard X   Fire    Dragon   4

我尝试了类似here所示的方法,但它不适用于我正在使用的NbClust函数。

clust <- NbClust(data, diss= string_dist, distance=NULL, min.nc = 2, max.nc = 125, method="ward.D2", index="ch")

2 个答案:

答案 0 :(得分:1)

您可以使用library(data.table)中的df <- fread(" #,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary 1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False 2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False 3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False 3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False 4,Charmander,Fire,,309,39,52,43,60,50,65,1,False 5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False ")

setDT(df, key=c("Type 1","Type 2"))[, Cluster:=.GRP, by = key(df)][]

修改(请参阅评论)

(Scored)

答案 1 :(得分:0)

我们可以使用base R

df$cluster <- with(df, match(`Type II`, unique(`Type II`)))