应用错误收集

我正在尝试使用R中的变量聚类包（varclus）创建分层聚类树。请注意，我想聚类变量（特征），而不是聚类我的数据集。问题是我有一个混合的数据集，其中包括分类（> = 2个类别）和数值变量，在这种情况下，我不知道如何处理分类变量的聚类。

我想将每个分类变量显示为我的树的单个变量（如this paper中的图22所示）。但是，当我执行varclus时，它将分类变量与两个以上的类别划分为不同的变量：

Hierarchical Clustering Tree

例如，类别域被划分为一组不同的变量（domainSystemSoftware，domainWebLibraries等）。这是我当前的代码：

independent.variables <- projects[,c("age", "languages", "forks", "stars", "core_contributors", "owner_type", "license", "domain", "has_readme", ``"has_contributing")]

hierarchal.tree <- varclus(~., data=independent.variables) spearman.threshold <- 0.7 plot(hierarchal.tree) abline(h=1 - spearman.threshold, col="red", lty=2)

redundant.variables <- redun(~., data=independent.variables, nk=0) print(redundant.variables) # Redundant variables (R^2 > 0.9) can be removed.

关于如何解决该问题的任何建议？

谢谢。

如何在分类数据（varclus）上使用变量聚类

0 个答案: