使用雏菊进行聚类分析

时间:2016-02-10 22:12:16

标签: r cluster-computing analysis r-daisy

我尝试使用包daisy对RStudio执行分层聚类分析。这是我的数据集:

data.frame':341 obs. of  28 variables:
$ Impo_Env : Ord.factor w/ 3 levels "Low"<"Med"<"High": 3 2 3 2 3 2 3 3 2 3 ...
$ ComparativePriority_IAS: Ord.factor w/ 3 levels "Low"<"Med"<"High": 3 1 3 2 3 2 3 2 3 2 ...
$ Strategy_Eradication: Ord.factor w/ 3 levels "No intervention"<..: 3 2 3 2 3 2 3 2 2 3 ...
$ Knowl_BiodivLoss: Factor w/ 2 levels "0","1": 2 1 2 2 2 1 2 2 2 2 ...
$ Control_Trade: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
$ Engagement_Retail: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
$ Knowl_PastProj: Factor w/ 2 levels "0","1": 1 1 1 1 2 1 1 1 2 1 ...
$ Priority_IAS: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
$ Knowl_Eradic: Factor w/ 2 levels "0","1": 2 1 2 1 2 2 1 2 2 1 ...
$ Alert_CFS: Factor w/ 2 levels "0","1": 1 2 1 2 1 2 2 1 2 1 ...
$ Alert_Municipality: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ Alert_Park: Factor w/ 2 levels "0","1": 2 1 2 1 2 1 1 2 1 1 ...
$ Alert_Police: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ Alert_Firemen: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 2 ...
$ Supp_AuthorityIAS: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
$ Knowl_Env: Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 2 ...
$ Info_Tv: Factor w/ 2 levels "0","1": 2 2 1 2 2 2 2 1 2 1 ...
$ Info_Web: Factor w/ 2 levels "0","1": 2 1 2 2 2 1 2 1 2 2 ...
$ Info_Radio: Factor w/ 2 levels "0","1": 1 1 1 1 1 2 1 2 1 1 ...
$ Info_Magazines: Factor w/ 2 levels "0","1": 1 1 2 1 2 1 1 2 1 1 ...
$ Info_School: Factor w/ 2 levels "0","1": 1 1 2 1 1 1 1 1 1 2 ...
$ Blacklist: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
$ Workshop: Factor w/ 2 levels "0","1": 1 1 2 1 2 1 2 2 1 1 ...
$ SuppFin_FutProj: Factor w/ 2 levels "0","1": 2 1 2 1 2 2 2 2 2 2 ...
$ Tourist_dummy: Factor w/ 2 levels "0","1": 1 1 1 2 2 1 1 1 2 1 ...
$ Gender: Factor w/ 2 levels "Female","Male": 1 2 1 2 1 1 2 2 2 1 ...
$ logIASknown: num  2.89 2.94 2.89 2.56 3.14 ...
$ Age: int  20 41 14 10 26 33 19 59 23 16 ...

我想使用daisy的欧几里德距离,但是当我跑

daisy(fuu, metric = c("euclidean"), type=list(ordratio = c(1,2,3), asymm=c(4:24), symm=c(25,26)))

输出不好。使用高尔的距离代替欧几里德距离:

  

警告信息:在雏菊中(fuu,metric = c(&#34;欧几里得&#34;),输入= list(ordratio = c(1,:混合变量,公制&#34; gower&#34;是自动使用

我该如何解决?

1 个答案:

答案 0 :(得分:0)

如集群包中包含的daisy函数文档中的“详细信息”部分所述:

  

对名义,序数和(a)对称二进制数据的处理是   通过使用Gower的一般相异系数来实现   (1971年)。如果x包含这些数据类型的任何列,则两个参数   公制和标准将被忽略,Gower的系数将被使用   作为指标

换句话说,对于要计算的欧氏度量(距离作为根的平方和),输入列必须是数值(模式)变量(即x为矩阵时的所有列),因此被识别为区间缩放变量,而不是名义(类因子列)变量或序数(类有序列)变量。在类型参数中指定变量类型不会改变这一事实。

在这些前提下,并假设它对28个变量都有意义,尽管有些是定性二进制,你可以尝试用as.numeric转换它们然后继续,原因是: 混合变量metric“gower”覆盖自动使用