在R中使用CLARA函数计算距离时出错

时间:2014-09-01 18:51:43

标签: r cluster-analysis

我正在通过聚类分为3类来创建消费者的市场细分。我正在使用具有CLARA聚类算法的群集CRAN包。

该数据包含12901个观测值,其中34个变量具有ordinalNA值。

ordinal值在类别之间没有相同的增量。例如,在HouseholdIncome列中,类别为" 0-15k"," 15k-25k"," 25k-35k"," 35k- 50k"," 50k-75k"," 75k-100k"," 100k-125k"," 125k-150k",& #34; 150k-175k"," 175k-200k"," 200k-250k"," 250k +"。

每行至少有一次观察。

> which(rowSums(is.na(Store2df))==ncol(Store2df))
named integer(0)

这是前七个变量的前五个观察结果。

> head(Store2df, n=5)
    Age Gender HouseholdIncome MaritalStatus PresenceofChildren HomeOwnerStatus HomeMarketValue
1  <NA>   Male            <NA>          <NA>               <NA>            <NA>            <NA>
2 45-54 Female            <NA>          <NA>               <NA>            <NA>            <NA>
5 45-54 Female        75k-100k       Married                Yes             Own       150k-200k
6 25-34   Male        75k-100k       Married                 No             Own       300k-350k
7 35-44 Female       125k-150k       Married                Yes             Own       250k-300k

这里是clara函数的代码:

> library(cluster)
> #Clara algorithm
> #Set seed for reproducibility
> set.seed(1)
> #Changing medoids.x and keep.data = TRUE - new way 
> client2.clara <- clara(Store2df, 3, metric = "manhattan", stand = FALSE, samples = 5,
+                        sampsize = (2500), medoids.x = TRUE, keep.data = TRUE, 
+                        rngR = TRUE, pamLike = TRUE)
#Error in clara(Store2df, 3, metric = "manhattan", stand = FALSE, samples = 5,  : 
  #Each of the random samples contains objects between which no distance can be computed.

如果我能提供更多信息,请告诉我。

CLARA的源代码:

ndyst = as.integer(if(metric == "manhattan") 2 else 1),

1 个答案:

答案 0 :(得分:2)

  

每个随机样本都包含无法计算距离的对象。

认真对待此错误消息...

metric = "manhattan"

没有为分类变量定义。

曼哈顿和欧几里德距离对数字向量进行操作(也应该线性缩放,例如不是角度或标记)。