我有一个包含11列和50行(加上标题行)的小型数据集。
我试图将R中的kmodes
聚类方法(来自klaR
包)应用于此文本矩阵。不幸的是,我收到一个我无法理解的错误:
kmodes(data, 5)
x [[jj]] [iseq]中的错误<-vjj:替换长度为零
出什么问题了? 如果我将其更改为:
kmodes(na.omit(data), 5)
错误是:
“名称”属性[2]的长度必须与矢量[0]的长度相同
数据看起来像这样
Type A1 A3 B5 C1 C2 C4 D5 D E2 E1
1 A pos pos neg <NA> <NA> <NA> <NA> <NA> <NA> U
2 A pos pos neg <NA> <NA> <NA> pos <NA> <NA> U
3 A U pos pos <NA> <NA> <NA> <NA> <NA> pos <NA>
4 A pos pos neg <NA> <NA> <NA> <NA> <NA> neg U
5 A pos pos neg <NA> <NA> pos <NA> <NA> neg <NA>
6 A pos pos <NA> <NA> <NA> <NA> <NA> neg neg neg
答案 0 :(得分:0)
这是由于所有行中的NA
值所致。省略NA
值将为您提供一个空的数据框,如下所示:
# [1] Type A1 A3 B5 C1 C2 C4 D5 D E2 E1
#<0 rows> (or 0-length row.names)
但是,如果仅将数据中的所有NA
值替换为一个空字符串""
,它将起作用。像这样:
kmodes(na.fill(data, fill=""), 5)
na.fill(data, fill="")
将为您提供以下数据:
# Type A1 A3 B5 C1 C2 C4 D5 D E2 E1
#[1,] "A" "pos" "pos" "neg" "" "" "" "" "" "" "U"
#[2,] "A" "pos" "pos" "neg" "" "" "" "pos" "" "" "U"
#[3,] "A" "U" "pos" "pos" "" "" "" "" "" "pos" ""
#[4,] "A" "pos" "pos" "neg" "" "" "" "" "" "neg" "U"
#[5,] "A" "pos" "pos" "neg" "" "" "pos" "" "" "neg" ""
#[6,] "A" "pos" "pos" "" "" "" "" "" "neg" "neg" "neg"
现在kmodes
的输出将是:
#K-modes clustering with 5 clusters of sizes 1, 1, 2, 1, 1
#Cluster modes:
# Type A1 A3 B5 C1 C2 C4 D5 D E2 E1
#1 A U pos pos pos
#2 A pos pos neg neg U
#3 A pos pos neg U
#4 A pos pos neg neg neg
#5 A pos pos neg pos neg
#Clustering vector:
#[1] 3 3 1 2 5 4
#Within cluster simple-matching distance by cluster:
#[1] 0 0 1 0 0
#Available components:
#[1] "cluster" "size" "modes" "withindiff" "iterations" "weighted"
希望有帮助。