我正在通过聚类分为3类来创建消费者的市场细分。我正在使用具有CLARA
聚类算法的群集CRAN包。
该数据包含12901个观测值,其中34个变量具有ordinal
和NA
值。
ordinal
值在类别之间没有相同的增量。例如,在HouseholdIncome列中,类别为" 0-15k"," 15k-25k"," 25k-35k"," 35k- 50k"," 50k-75k"," 75k-100k"," 100k-125k"," 125k-150k",& #34; 150k-175k"," 175k-200k"," 200k-250k"," 250k +"。
每行至少有一次观察。
> which(rowSums(is.na(Store2df))==ncol(Store2df))
named integer(0)
这是前七个变量的前五个观察结果。
> head(Store2df, n=5)
Age Gender HouseholdIncome MaritalStatus PresenceofChildren HomeOwnerStatus HomeMarketValue
1 <NA> Male <NA> <NA> <NA> <NA> <NA>
2 45-54 Female <NA> <NA> <NA> <NA> <NA>
5 45-54 Female 75k-100k Married Yes Own 150k-200k
6 25-34 Male 75k-100k Married No Own 300k-350k
7 35-44 Female 125k-150k Married Yes Own 250k-300k
这里是clara函数的代码:
> library(cluster)
> #Clara algorithm
> #Set seed for reproducibility
> set.seed(1)
> #Changing medoids.x and keep.data = TRUE - new way
> client2.clara <- clara(Store2df, 3, metric = "manhattan", stand = FALSE, samples = 5,
+ sampsize = (2500), medoids.x = TRUE, keep.data = TRUE,
+ rngR = TRUE, pamLike = TRUE)
#Error in clara(Store2df, 3, metric = "manhattan", stand = FALSE, samples = 5, :
#Each of the random samples contains objects between which no distance can be computed.
如果我能提供更多信息,请告诉我。
CLARA的源代码:
ndyst = as.integer(if(metric == "manhattan") 2 else 1),
答案 0 :(得分:2)
每个随机样本都包含无法计算距离的对象。
认真对待此错误消息...
metric = "manhattan"
没有为分类变量定义。
曼哈顿和欧几里德距离对数字向量进行操作(也应该线性缩放,例如不是角度或标记)。